G1 Running Policy Family¶
This experiment family extends Unitree's base G1 29DOF velocity task into progressively faster forward locomotion tasks without modifying the upstream walking task.
| Variant | Task ID | Config | Notes |
|---|---|---|---|
| Running | Unitree-G1-29dof-Running |
running_env_cfg.py |
First running task. Changes command range, gait cadence, clearance, and running reward weights. |
| Fast running | Unitree-G1-29dof-Running-Fast |
fast_running_env_cfg.py |
Warm-started from running; increases forward curriculum to 3.0 m/s. |
| Sprint 10 m/s | Unitree-G1-29dof-Sprint-10ms |
sprint_10ms_env_cfg.py |
Paused at model_20500.pt; reached about 7.2 m/s curriculum, but gait still needs tuning. |
| Sprint 10 m/s gait cleanup | Unitree-G1-29dof-Sprint-10ms-Gait |
sprint_10ms_env_cfg.py |
New variant that relaxes waist/hip penalties and gates velocity curriculum on fall/reset stability. |
Shared Setup¶
| Item | Value |
|---|---|
| Base task | Unitree-G1-29dof-Velocity |
| Base config | source/unitree_rl_lab/unitree_rl_lab/tasks/locomotion/robots/g1/29dof/velocity_env_cfg.py |
| Trainer | RSL-RL PPO |
| Robot | Plain G1 29DOF |
| Observation/action interface | Same as the base velocity task |
| Policy observation shape | 480 |
| Policy action shape | 29 joint action targets |
| Full actor-critic checkpoint parameters | 832,571 |
The deployed actor portion of the policy is smaller than the full training checkpoint. The actor network has 414,237 parameters and maps the 480-value policy observation to 29 joint action targets. The critic has 418,305 parameters and is used during PPO training only. The remaining 29 parameters are the learned action standard deviation used for exploration.
Lineage¶
Unitree-G1-29dof-Velocity
-> Unitree-G1-29dof-Running
-> Unitree-G1-29dof-Running-Fast
-> Unitree-G1-29dof-Sprint-10ms
-> Unitree-G1-29dof-Sprint-10ms-Gait
Each faster variant is a separate task ID and writes to a separate experiment folder under logs/rsl_rl/.
Key Metrics¶
Watch these in TensorBoard:
| Metric | Meaning |
|---|---|
Train/mean_reward |
Overall learning signal; should trend upward |
Train/mean_episode_length |
How long robots survive; higher is better |
Metrics/base_velocity/error_vel_xy |
Forward/lateral velocity tracking error; lower is better |
Curriculum/lin_vel_cmd_levels |
How far the velocity curriculum has expanded |
Curriculum/lin_vel_cmd_stability/failure_rate |
For stability-gated tasks, recent fall-like reset rate |
Episode_Termination/bad_orientation |
Falling/tipping termination; should drop over time |
Use playback before trusting a checkpoint. Good TensorBoard curves can still hide ugly gaits, skating, terrain-edge artifacts, or reward hacking.