Main Goal:

I. Why use tmux?

During long-running tasks like machine learning training, a simple SSH disconnect can kill your process and lose hours of progress. tmux (terminal multiplexer) solves this by:


II. Step-by-Step Training Workflow

1. Connect to remote server

First, establish your connection to the training node via SSH.

ssh server4@10.147.18.178

2. Start a tmux session

Create a new session with a specific name (e.g., “train”) so you can identify it later.

tmux new -s train

You are now inside a virtual terminal session. Even if you disconnect, this environment persists.

3. Run your training

Execute your training command inside the session.

instinct-train ...

(Optional: save logs) To monitor progress while saving output to a file, use the tee command:

instinct-train ... | tee train.log

4. Detach (VERY IMPORTANT)

To leave the session running while you close your terminal or laptop, you must detach.

tmux detach

OR shortcut: Ctrl + B then D

👉 Your training will continue running in the background on the server.

5. Check running sessions

If you forget your session name or want to see what is running:

tmux ls

Example output: train: 1 windows (detached)

6. Reconnect to training later

When you come back to check your progress, simply re-attach:

ssh server4@10.147.18.178
tmux attach -t train

7. Stop training

Once the job is finished, you can close the session entirely:

tmux kill-session -t train

III. Best Practices & Tips

⚠️ Important Rules

💡 Useful Commands

Force detach (if stuck): ```bash tmux detach-client


**Multiple experiments:** You can run multiple sessions simultaneously for different experiments:
```bash
tmux new -s exp1
tmux new -s exp2
  1. ssh server4@10.147.18.178
  2. tmux new -s train
  3. instinct-train ...
  4. tmux detach

🎯 Result

Happy training 🚀