RL Training Guide — G1 Locomotion from Scratch¶
Step-by-step reproduction guide for training the G1-29DOF locomotion policy and running it in simulation.
Repositories¶
Three repos are needed. All live in ~/GIT/.
| Repo | Purpose | Location |
|---|---|---|
unitree_rl_lab |
RL training framework (train, evaluate, export) | ~/GIT/unitree_rl_lab/ |
unitree_ros |
URDF robot description files (meshes, joints, physics) | ~/GIT/unitree_ros/ |
unitree_sim_isaaclab |
Full simulation environment for deployment testing | ~/GIT/unitree_sim_isaaclab/ |
graph TD
A[unitree_ros\nURDF files] -->|G1 joint model| B[unitree_rl_lab\nTraining]
B -->|policy.onnx| C[unitree_sim_isaaclab\nFull Simulation]
D[Isaac Lab 2.3.2] --> B
D --> C
E[Isaac Sim 5.1] --> D
Prerequisites¶
Isaac Sim and Isaac Lab must be installed in the isaaclab conda environment before cloning the Unitree repos.
# 1. Install Isaac Sim 5.1 (requires Python 3.11 and GLIBC 2.35+)
conda activate isaaclab
pip install "isaacsim[all,extscache]==5.1.0" --extra-index-url https://pypi.nvidia.com
# 2. Clone and install Isaac Lab 2.3.2
git clone https://github.com/isaac-sim/IsaacLab.git
cd IsaacLab
./isaaclab.sh --install
Step 1: Clone the Repos¶
cd ~/GIT
git clone https://github.com/unitreerobotics/unitree_rl_lab.git
git clone https://github.com/unitreerobotics/unitree_ros.git
git clone https://github.com/unitreerobotics/unitree_sim_isaaclab.git
Step 2: Configure the URDF Path¶
unitree_rl_lab needs to know where the G1 URDF files live. Edit this file:
Set the path to the cloned unitree_ros directory:
Then in the same file, switch the G1 29DOF config to use URDF instead of USD. Find the UNITREE_G1_29DOF_CFG section and:
- Uncomment
UnitreeUrdfFileCfg - Comment out
UnitreeUsdFileCfg
This tells Isaac Lab to load the G1 from the URDF robot description instead of Unitree's pre-packaged USD scene file.
Step 3: Install unitree_rl_lab¶
The -e flag installs in editable mode, so changes to the source files take effect immediately without reinstalling.
Step 4: Train the Policy¶
conda activate isaaclab
cd ~/GIT/unitree_rl_lab
python scripts/rsl_rl/train.py --headless --task Unitree-G1-29dof-Velocity
--headless skips the GUI, which is much faster. 4,096 robots run in parallel on the GPU.
What happens:
- Isaac Sim loads the G1 URDF and spawns 4,096 robots in a grid
- Each robot gets a random velocity command each episode
- Rewards and penalties are calculated every timestep
- The neural network updates its weights based on what worked and what did not
- Checkpoints are saved every 100 iterations to
logs/rsl_rl/unitree_g1_29dof_velocity/<timestamp>/
Training runs until you stop it (Ctrl+C) or the reward plateaus. A usable policy typically emerges around iteration 1,000. We ran to 7,200.
Resuming a Training Run¶
To continue from where you left off:
# Resume from latest checkpoint in a run
python scripts/rsl_rl/train.py --headless --task Unitree-G1-29dof-Velocity \
--resume \
--load_run 2026-03-06_14-30-46
# Resume from a specific checkpoint
python scripts/rsl_rl/train.py --headless --task Unitree-G1-29dof-Velocity \
--resume \
--load_run 2026-03-06_14-30-46 \
--checkpoint model_7200.pt
The run folder name is the timestamp directory under logs/rsl_rl/unitree_g1_29dof_velocity/.
Step 5: Monitor Training¶
In a second terminal while training runs:
Open http://workstation:6006 in a browser. Key metrics to watch:
| Metric | Good trend | What it means |
|---|---|---|
Mean reward |
Increasing | Overall policy quality |
Episode_Termination/bad_orientation |
Decreasing toward 0 | Fall rate |
Episode_Reward/track_lin_vel_xy |
Increasing | Velocity command accuracy |
Episode_Length/mean |
Increasing toward max | Robots staying alive longer |
Reward Configuration¶
The reward function is defined in:
~/GIT/unitree_rl_lab/source/unitree_rl_lab/unitree_rl_lab/tasks/locomotion/robots/g1/29dof/velocity_env_cfg.py
Positive terms (things the robot is rewarded for):
| Term | Weight | Description |
|---|---|---|
track_lin_vel_xy |
+1.0 | Follow commanded forward/lateral speed |
track_ang_vel_z |
+0.5 | Follow commanded yaw (turning) rate |
alive |
+0.15 | Survive the episode without falling |
gait |
+0.5 | Maintain a natural alternating gait |
feet_clearance |
+1.0 | Lift feet cleanly off the ground |
Penalty terms (negative weights, things to minimize):
| Term | Weight | Description |
|---|---|---|
flat_orientation_l2 |
-5.0 | Penalizes tilting (stay upright) |
base_height |
-10.0 | Penalizes deviating from 0.78 m standing height |
dof_pos_limits |
-5.0 | Penalizes joints approaching their limits |
base_linear_velocity (z) |
-2.0 | Penalizes bouncing (vertical movement) |
joint_deviation_arms |
-0.1 | Keeps arms near default pose |
action_rate |
-0.05 | Penalizes rapid changes in joint targets |
energy |
-2e-5 | Penalizes unnecessary energy use |
To tune the policy behavior, change these weights. Increasing track_lin_vel_xy makes the robot prioritize speed over stability. Increasing flat_orientation_l2 makes it more conservative about tilting. Weights are just floating point numbers in that config file.
Step 6: Play Back the Trained Policy¶
To run the policy in simulation with a GUI:
conda activate isaaclab
cd ~/GIT/unitree_rl_lab
# Basic playback — loads latest checkpoint, opens GUI
python scripts/rsl_rl/play.py --task Unitree-G1-29dof-Velocity
# Control how many robots are shown (default is 32 or so)
python scripts/rsl_rl/play.py --task Unitree-G1-29dof-Velocity --num_envs 30
# Load a specific checkpoint (instead of the latest)
python scripts/rsl_rl/play.py --task Unitree-G1-29dof-Velocity \
--load_run 2026-03-06_14-30-46 \
--checkpoint model_7200.pt
# Record a video of the playback
python scripts/rsl_rl/play.py --task Unitree-G1-29dof-Velocity \
--video --video_length 300
This loads the checkpoint from logs/rsl_rl/unitree_g1_29dof_velocity/ and opens a GUI window. The Isaac Sim viewport shows the robots walking. Use the keyboard or gamepad to send velocity commands.
Requires a display
play.py opens the full Isaac Sim GUI. Run it from a desktop session (Sunshine/Moonlight remote desktop), not from an SSH terminal. For headless playback with video recording, add --headless --video.
Output Files¶
After training, checkpoints are saved here:
~/GIT/unitree_rl_lab/logs/rsl_rl/unitree_g1_29dof_velocity/2026-03-06_14-30-46/
├── exported/
│ ├── policy.onnx <- portable policy, deploy this
│ └── policy.pt <- PyTorch checkpoint, for fine-tuning
├── params/
│ ├── velocity_env_cfg.py <- full environment config snapshot
│ └── deploy.yaml <- PD gains, joint order, obs scaling
├── model_100.pt <- checkpoint at iter 100
├── model_200.pt <- checkpoint at iter 200
│ ...
└── model_7200.pt <- final checkpoint
The policy.onnx file is what gets deployed. It is a portable format that runs on any machine with ONNX Runtime -- the workstation, the Jetson, or a laptop.
Step 7: Deploy in unitree_sim_isaaclab¶
Copy the trained JIT policy to the simulation environment:
cp ~/GIT/unitree_rl_lab/logs/rsl_rl/unitree_g1_29dof_velocity/2026-03-06_14-30-46/exported/policy.pt \
~/GIT/unitree_sim_isaaclab/assets/model/our_policy.pt
Launch the simulation¶
The custom task Isaac-Locomotion-G129-Warehouse loads the exact same robot config as training (URDF, actuator stiffness/damping, joint defaults) inside a warehouse scene.
conda activate isaaclab
cd ~/GIT/unitree_sim_isaaclab
python sim_main.py --device cuda:0 --enable_cameras \
--task Isaac-Locomotion-G129-Warehouse \
--action_source custom_rl \
--model_path assets/model/our_policy.pt \
--robot_type g129 \
--camera_include front_camera \
--enable_wholebody_dds
Control the robot with keyboard¶
In a separate terminal:
Controls (the terminal with the keyboard script must have focus):
| Key | Action |
|---|---|
| W/S | Forward / Backward |
| A/D | Strafe left / right |
| Z/X | Turn left / right |
| C | Crouch |
| Q | Quit |
How it works¶
The v2 action provider (action_provider_custom_rl_v2.py) uses Isaac Lab's own ObservationManager and ActionManager to construct observations and apply actions. This guarantees identical behavior to training -- no manual observation construction, no action scaling bugs, no joint ordering issues.
The provider reads velocity commands from DDS (keyboard or Nav2), overrides the env's command manager, then runs the standard Isaac Lab pipeline: compute observations, run the policy, process and apply actions, step physics.
Alternative: Quick demo without DDS¶
For a quick demo without DDS or keyboard control (robot follows random velocity commands):
conda activate isaaclab
cd ~/GIT/unitree_rl_lab
python scripts/rsl_rl/play_warehouse.py --num_envs 1
This uses Isaac Lab's built-in env.step() loop with the training config and a warehouse terrain.
Quick Reference¶
conda activate isaaclab
cd ~/GIT/unitree_rl_lab
# Start a new training run
python scripts/rsl_rl/train.py --headless --task Unitree-G1-29dof-Velocity
# Resume a training run
python scripts/rsl_rl/train.py --headless --task Unitree-G1-29dof-Velocity \
--resume --load_run 2026-03-06_14-30-46
# Stop training
Ctrl+C
# Monitor training (open http://workstation:6006)
tensorboard --logdir logs/rsl_rl/ --host 0.0.0.0
# Play back with 20 robots visible
python scripts/rsl_rl/play.py --task Unitree-G1-29dof-Velocity --num_envs 20
# Play in warehouse (quick demo, random velocity commands)
python scripts/rsl_rl/play_warehouse.py --num_envs 1
# Trained policy location
ls logs/rsl_rl/unitree_g1_29dof_velocity/2026-03-06_14-30-46/exported/
# Deploy in unitree_sim_isaaclab with DDS keyboard control
cd ~/GIT/unitree_sim_isaaclab
python sim_main.py --device cuda:0 --enable_cameras \
--task Isaac-Locomotion-G129-Warehouse \
--action_source custom_rl \
--model_path assets/model/our_policy.pt \
--robot_type g129 --camera_include front_camera \
--enable_wholebody_dds
# Keyboard control (separate terminal)
python send_commands_keyboard.py
About Unitree-G1-29dof-Velocity¶
This is a Gymnasium task ID -- a registered name that maps to the full environment configuration. It was created by Unitree and ships inside unitree_rl_lab.
When you pass --task Unitree-G1-29dof-Velocity, it looks up this registration:
# source/unitree_rl_lab/unitree_rl_lab/tasks/locomotion/robots/g1/29dof/__init__.py
gym.register(
id="Unitree-G1-29dof-Velocity",
kwargs={
"env_cfg_entry_point": "velocity_env_cfg:RobotEnvCfg",
"rsl_rl_cfg_entry_point": "rsl_rl_ppo_cfg:BasePPORunnerCfg",
},
)
The two files it points to are where the actual configuration lives:
| File | What it controls |
|---|---|
tasks/locomotion/robots/g1/29dof/velocity_env_cfg.py |
Robot, terrain, observations, reward weights |
tasks/locomotion/agents/rsl_rl_ppo_cfg.py |
Learning rate, batch size, network size, iterations |
To make the robot do something different, modify velocity_env_cfg.py. train.py itself is just a launcher -- you do not edit it.