Skip to content

TensorBoard — Training Monitoring

Monitor G1 locomotion training progress in real time.

Launch

conda activate isaaclab
cd ~/GIT/unitree_rl_lab
tensorboard --logdir logs/rsl_rl/ --host 0.0.0.0

Open http://workstation:6006 in a browser. TensorBoard auto-refreshes as new data arrives during training.


Key Metrics

Metric What it means Good trend
Mean reward Overall training progress — weighted sum of all reward terms Upward
Mean episode length How long robots stay alive before termination Upward
Episode_Termination/bad_orientation Percentage of robots falling over Downward
Episode_Termination/time_out Robots surviving the full episode length Upward
Episode_Reward/track_lin_vel_xy How well the robot follows velocity commands Upward
Episode_Reward/feet_clearance Foot lift quality — clean stepping gait Upward

Mean reward is the single most important metric. The termination and reward breakdowns explain why it's going up or down.


Training Phases

G1 locomotion training progresses through distinct phases:

Phase Behavior Typical iteration
1. Random flailing Joints move randomly, robots fall immediately 0–50
2. Standing Robots learn to stay upright but don't move 50–200
3. Balancing Robots shift weight, may shuffle in place 200–500
4. Stepping Recognizable leg movements, clumsy walking 500–1000
5. Walking Smooth gait, follows velocity commands 1000+

Iteration ranges are approximate — varies with reward tuning and random seed.


When to Stop Training

Training doesn't need to converge fully to be useful. Stop when:

  • Mean reward plateaus for 200+ iterations with no improvement
  • bad_orientation drops below ~5% (robots rarely fall)
  • track_lin_vel_xy is high and stable (robot follows commands reliably)
  • Mean episode length is near the maximum episode limit

On an RTX 3090, a good policy typically emerges within 1000–2000 iterations (~30 min). Diminishing returns after that — export the .onnx and test in sim before investing more training time.