Nav2 (Autonomous Navigation)¶
Autonomous navigation using Nav2 — path planning, obstacle avoidance, and goal tracking. Uses 3D lidar for costmaps and a static identity map -> odom transform for localization (no static map or AMCL needed).
| Launch script | launch_nav2.sh -> nav2_carter_launch.py |
| Config file | nav2_carter_params.yaml |
| Behavior tree | nav2_bt_carter.xml (replans every 5s instead of default 1s) |
| Subscribes to | /front_3d_lidar/lidar_points (costmaps), /chassis/odom, TF chain |
| Publishes | /cmd_vel, /plan, /global_costmap/costmap, /local_costmap/costmap |
| Requires | map -> odom -> base_link TF chain (see TF Setup), cmd_vel_relay.py for OmniGraph topic remap |
Architecture¶
flowchart LR
FOX["Foxglove"] -- "/goal_pose" --> BT["bt_navigator"]
BT --> PLAN["planner_server<br/>(SMAC Hybrid-A*)"]
BT --> CTRL["controller_server<br/>(MPPI)"]
BT --> BEH["behavior_server<br/>(spin, backup, wait)"]
PLAN --> GCOST["global_costmap<br/>(30x30m rolling)"]
CTRL --> LCOST["local_costmap<br/>(16x16m)"]
GCOST -- "VoxelLayer" --> LIDAR["/front_3d_lidar/<br/>lidar_points"]
LCOST -- "VoxelLayer" --> LIDAR
CTRL -- "/cmd_vel" --> RELAY["cmd_vel_relay<br/>(pass-through)"]
RELAY -- "/cmd_vel_raw" --> SIM["Isaac Sim"]
How the stack fits together¶
Costmaps¶
VoxelLayer converts 3D lidar point clouds into obstacle grids. No static map is needed; the costmaps build up from live sensor data as the robot moves. Points below 1.0m are filtered to ignore ground bumps. track_unknown_space: false on the global costmap so unscanned cells default to "free" rather than "unknown" (which blocks the planner).
Localization and TF¶
Nav2 requires a map -> odom -> base_link TF chain. The map -> odom link is where localization happens — it corrects for odometry drift so the robot knows where it is on the map. Two options can provide it:
| Static identity transform | cuVSLAM | |
|---|---|---|
| How | Publishes a fixed zero-offset map -> odom |
Uses stereo cameras to track visual features and estimate drift |
| Pros | Simple, no drift, no extra GPU load | Corrects odometry drift over time |
| Cons | Position drifts with odometry (no correction) | Can drift massively itself (observed 49m Z-drift, 29deg pitch after collisions) |
| When | Local navigation without a saved map | Long-range navigation where cumulative odometry error matters |
We use the static identity transform — the robot treats its starting position as the map origin, and all goals are relative to that. This is good enough for navigating within the warehouse.
The other two links are straightforward: odom -> base_link comes from odom_tf_bridge.py (converts the /chassis/odom topic into a TF broadcast, since Isaac Sim doesn't publish it as TF), and base_link -> sensors is published by Isaac Sim automatically.
Sim time for static TF
The static_transform_publisher must use use_sim_time:=true. Without it, the TF timestamps are wall-clock time (~1.77 billion seconds) while Nav2 expects sim time (~6000 seconds), causing all TF lookups to fail silently.
Planning¶
SMAC Hybrid-A* planner with allow_unknown: true so it can plan through unexplored areas. The global costmap is a 30x30m rolling window centered on the robot. The custom behavior tree replans every 5 seconds (default is 1s) to give the robot time to accelerate and commit to a path.
Control¶
MPPI controller generates smooth velocity commands toward the planned path. Configured for up to 2.0 m/s, though actual cruise speed is ~0.3 m/s in simulation due to 0.5x real-time factor (see MPPI Speed Tuning). Carter footprint is used for collision checking. SimpleGoalChecker with 0.5m tolerance so the robot actually stops at the goal.
cmd_vel relay¶
Isaac Sim's differential drive OmniGraph node is remapped to subscribe to /cmd_vel_raw instead of /cmd_vel (via headless-sample-scene.sh). The cmd_vel_relay.py node bridges Nav2/teleop output on /cmd_vel to /cmd_vel_raw. This is a simple pass-through with no sign changes — base_link +X = physical forward.
Launching¶
Auto-start (default)¶
All helper processes and Nav2 auto-start with the container via scripts in /usr/local/bin/scripts/entrypoint_additions/:
| Script | Process |
|---|---|
75-odom-tf-bridge.sh |
odom -> base_link TF bridge |
76-static-map-odom.sh |
Static map -> odom identity TF |
77-cmd-vel-relay.sh |
cmd_vel relay (pass-through /cmd_vel -> /cmd_vel_raw) |
80-nav2.sh |
Full Nav2 stack |
Teleop coexists with Nav2
Teleop (70-teleop-twist-joy.sh) publishes to /cmd_vel, the same topic as Nav2. The Foxglove joystick only publishes when touched (no zero-flooding), so it naturally overrides Nav2 during use and stops when released. Both can run simultaneously.
Manual launch¶
If auto-start is disabled, launch all helpers and Nav2 by hand:
# 1. Start TF helpers (see above)
# 2. Start cmd_vel relay (pass-through /cmd_vel -> /cmd_vel_raw)
docker exec -d isaac_ros_dev_container bash /workspaces/isaac_ros-dev/77-cmd-vel-relay.sh
# 3. Start Nav2
docker exec -it isaac_ros_dev_container bash /workspaces/isaac_ros-dev/launch_nav2.sh
Wait for "Managed nodes are active" in the output — the SMAC planner takes ~15 seconds to build its lookup table, so full startup is around 30-40 seconds.
Custom launch file
Nav2 1.3.9's stock navigation_launch.py requires nav2_route which isn't in the container's apt repos. nav2_carter_launch.py is a custom launch file that skips route_server, collision_monitor, and docking_server.
Sending Goals¶
From Foxglove¶
The 3D panel's built-in "Publish Pose" uses foxglove.PoseInFrame schema, which the bridge doesn't convert to ROS. Use a Publish panel instead:
- Add panel -> Publish
- Set Topic to
/goal_pose - Set Schema to
geometry_msgs/msg/PoseStamped - Publish a message like:
{
"header": {"frame_id": "map"},
"pose": {
"position": {"x": 3.0, "y": 0.0, "z": 0.0},
"orientation": {"w": 1.0}
}
}
From the command line¶
ros2 topic pub /goal_pose geometry_msgs/msg/PoseStamped \
'{header: {frame_id: "map"}, pose: {position: {x: 3.0, y: 0.0}, orientation: {w: 1.0}}}' --once
Visualization in Foxglove¶
Costmap overlay¶
Add these topics to the Foxglove 3D panel to see obstacle detection:
/local_costmap/costmap— 16x16m area around the robot/global_costmap/costmap— 30x30m rolling window/plan— planned path to the goal
Position tracking¶
Add a Plot panel with two series to see XY coordinates over time:
/chassis/odom.pose.pose.position.x/chassis/odom.pose.pose.position.y
Teleop Override¶
Both teleop_twist_joy_node and Nav2 publish to /cmd_vel. The Foxglove joystick extension only publishes /joy messages when the stick is actively touched — it does not flood zeros when idle. This means Nav2 commands pass through uninterrupted when the joystick is released, and joystick input naturally overrides Nav2 when you grab the stick.
No need to kill teleop
Unlike teleop_twist_keyboard (which floods zeros), the joystick teleop coexists with Nav2. Both auto-start with the container.
MPPI Speed Tuning¶
Out of the box, MPPI defaults produce very slow movement (~0.05 m/s) despite a 2.0 m/s max velocity setting. This section explains how MPPI works, what each tuning knob does, and the specific bottlenecks discovered through iterative testing on the Carter robot.
How MPPI works¶
MPPI (Model Predictive Path Integral) is a sampling-based controller. Instead of computing a single optimal path, it tries thousands of random possibilities and picks the best one. Every control cycle it:
- Samples a batch of random trajectories (default 2000) by adding noise to the current velocity
- Scores each trajectory using a set of "critics" — scoring functions that penalize things like straying from the path, getting too close to obstacles, or going the wrong direction
- Selects the best trajectory using a temperature-weighted average — lower scores are better, and the
temperatureparameter controls how aggressively it favors the best-scoring trajectories
flowchart LR
VEL["Current velocity"] --> NOISE["Add random noise<br/>(2000 samples)"]
NOISE --> SIM["Simulate each<br/>trajectory forward<br/>(56 steps × 0.05s)"]
SIM --> SCORE["Score with critics<br/>(8 scoring functions)"]
SCORE --> AVG["Temperature-weighted<br/>average"]
AVG --> CMD["/cmd_vel"]
CMD --> VEL
The prediction horizon — how far into the future each trajectory looks — is time_steps × model_dt. With 56 steps at 0.05s each, that's 2.8 seconds ahead. At the max speed of 2.0 m/s, that covers 5.6 meters of path.
Understanding the critics¶
Each critic assigns a penalty score to a trajectory. Some critics evaluate every step of the trajectory (56 penalties added up), while others only look at the endpoint (one penalty). This distinction matters for speed — per-step critics accumulate higher penalties on faster trajectories simply because the robot covers more ground, even if the trajectory is perfectly safe.
| Critic | What it does | Evaluation | Speed impact |
|---|---|---|---|
ConstraintCritic |
Rejects trajectories exceeding velocity/acceleration limits | Per-step | Neutral (hard constraint) |
CostCritic |
Penalizes trajectories passing through costmap obstacles | Per-step | Slows — faster = more cells traversed = higher accumulated cost |
GoalCritic |
Rewards getting closer to the goal position | Endpoint | Neutral |
GoalAngleCritic |
Rewards facing the goal at close range | Endpoint | Neutral |
PathAlignCritic |
Penalizes deviation from the planned path | Per-step (mean-normalized) | Neutral — uses mean normalization, not sum |
PathFollowCritic |
Rewards reaching further along the path | Endpoint | Speeds up — the main forward-driving force |
PathAngleCritic |
Penalizes heading away from the path direction | Per-step (early-exit) | Minimal — exits early once heading is aligned |
PreferForwardCritic |
Penalizes reverse motion | Per-step | Minimal at low weight |
PathFollowCritic is the most important critic for speed. It only looks at the furthest point reached along the path — faster trajectories reach further and score better. Its cost_weight of 50.0 is intentionally high to dominate the per-step critics that slow things down.
Key speed bottlenecks¶
These are the specific issues discovered through iterative tuning, in order of impact:
1. temperature (the #1 bottleneck)¶
temperature controls how selective the weighted average is when combining the 2000 sampled trajectories. A low temperature exponentially down-weights any trajectory that costs even slightly more than the current best.
The problem: at temperature: 0.1, once the optimizer converges on ~0.4 m/s, it can't escape. Any faster trajectory incurs slightly higher per-step critic costs, so it gets exponentially suppressed in the weighted average. The optimizer is trapped in a local speed minimum.
| Value | Behavior |
|---|---|
| 0.1 | Too greedy — trapped at ~0.4 m/s, can't explore faster solutions |
| 0.3 | Default — allows proper exploration while still favoring good trajectories |
| 1.0+ | Too permissive — erratic, includes many bad trajectories in the average |
2. Costmap, prune distance, and horizon consistency¶
Three settings must be mathematically consistent or the controller physically cannot command high speeds:
prediction_horizon = time_steps × model_dt = 56 × 0.05 = 2.8 seconds
max_projection = vx_max × horizon = 2.0 × 2.8 = 5.6 meters
prune_distancemust be >= max_projection (5.6m). If the controller only sees 3.5m of path, it can't plan trajectories that go further. Set to 8.0m for margin.- Local costmap must be >= 2 × max_projection (11.2m). Trajectories that leave the costmap boundary get rejected. Set to 16×16m.
PathFollowCritic.cost_weightmust be high enough (50.0) to make reaching further along the path the dominant objective.
3. vx_std (velocity sampling noise)¶
vx_std controls how much random noise is added to the current velocity when generating sample trajectories. It must scale with vx_max:
| Value | With vx_max=2.0 | Result |
|---|---|---|
| 0.2 | 95% of samples within ±0.4 m/s of current speed | Too narrow — takes many cycles to explore faster |
| 0.5 | 95% of samples within ±1.0 m/s | Good balance of exploration and stability |
| 0.8 | 95% of samples within ±1.6 m/s | Too wide — some samples go fast-forward, others fast-backward, and the weighted average cancels them to near-zero |
4. Compute budget (batch_size × iteration_count)¶
MPPI must finish computing within one control cycle. At 0.5× real-time factor, CPU time is already stretched:
| Setting | Control rate | Result |
|---|---|---|
| batch_size=3000, iteration_count=2 | ~12 Hz | Erratic — can't keep up, oscillating commands |
| batch_size=2000, iteration_count=1 | ~15-20 Hz | Stable — smooth trajectory tracking |
5. model_dt vs controller_frequency¶
MPPI checks at startup that model_dt >= 1/controller_frequency. If the controller runs faster than the simulation timestep, predictions are invalid.
| controller_frequency | Period | model_dt | Result |
|---|---|---|---|
| 10 Hz | 0.1s | 0.05s | FATAL — period (0.1s) > model_dt (0.05s), Nav2 refuses to start |
| 20 Hz | 0.05s | 0.05s | Passes check (equal). Actually runs at ~15 Hz due to 0.5× RTF |
model_dt startup constraint
model_dt must be >= 1/controller_frequency or MPPI crashes on startup with: "Controller period more than model dt, set it equal to model dt." This is a hard check — there's no workaround.
Obstacle avoidance tuning¶
Per-step critics like CostCritic accumulate higher penalties on faster trajectories because the robot passes through more costmap cells. Two settings reduce this effect:
| Parameter | Value | Why |
|---|---|---|
CostCritic.cost_weight |
0.5 | Low weight so obstacle cost doesn't dominate PathFollowCritic (50.0) |
CostCritic.trajectory_point_step |
4 | Evaluate every 4th step instead of every step — reduces accumulation by 75% |
inflation_radius (local) |
0.75 | Must be >= circumscribed radius (0.696m) for the planner |
cost_scaling_factor (local) |
10.0 | Steep falloff — costs drop to near-zero by ~0.5m from obstacle edges, so the robot isn't "always in cost-space" in narrow aisles |
Results¶
With these settings the robot navigates to goals at ~0.4 m/s peak commanded, ~0.3 m/s peak odom, with a cruise speed of ~0.15-0.20 m/s. The simulation runs at 0.5× real-time on an RTX 3090 with headless rendering.
Why not faster?
The main remaining bottleneck is the WallRate sim-time mismatch. Nav2's controller uses wall-clock time for its loop rate, but the sim runs at 0.5× real-time. A 20 Hz controller effectively runs at ~10 Hz in sim-time, causing prediction horizon mismatches. This is a known Nav2 issue with no fix available — the controller would need a SimRate option.
Dynamic parameter tuning
Most MPPI parameters can be changed at runtime without restarting Nav2:
Exception: time_steps and model_dt are allocated at init time. Changing them dynamically corrupts internal buffers — always change via YAML and restart.
Config Files¶
All files are in ~/workspaces/isaac_ros-dev/:
| File | Purpose |
|---|---|
nav2_carter_params.yaml |
Nav2 node parameters (costmaps, planner, controller, speeds) |
nav2_carter_launch.py |
Custom launch file (skips missing nav2_route package) |
nav2_bt_carter.xml |
Behavior tree — replan rate, recovery behaviors |
launch_nav2.sh |
Shell wrapper with FastDDS and RMW env vars |
odom_tf_bridge.py |
Converts /chassis/odom to odom -> base_link TF broadcast |
cmd_vel_relay.py |
Pass-through relay /cmd_vel -> /cmd_vel_raw (for OmniGraph remap) |
Auto-start entrypoint scripts (in /usr/local/bin/scripts/entrypoint_additions/):
| File | Purpose |
|---|---|
75-odom-tf-bridge.sh |
Odom TF bridge on container boot |
76-static-map-odom.sh |
Static map->odom TF on container boot |
77-cmd-vel-relay.sh |
cmd_vel relay on container boot |
80-nav2.sh |
Nav2 stack on container boot |
Troubleshooting¶
TF and localization¶
"Timed out waiting for transform"
: The TF chain is incomplete. Verify all three links exist: map -> odom (static publisher or cuVSLAM), odom -> base_link (odom_tf_bridge.py), base_link -> sensors (Isaac Sim). transform_tolerance is set to 0.5s.
Point cloud rotated/drifted : cuVSLAM has lost tracking accuracy (often after a collision). Restart cuVSLAM to reset:
```bash
docker exec isaac_ros_dev_container bash -c "pkill -f visual_slam"
```
Then relaunch cuVSLAM.
Planning and goals¶
Planner fails to find path
: The goal may be outside the costmap (60m rolling window), or in unknown space. With track_unknown_space: false, the costmap defaults to free, so this should be rare. Check the goal is within range with ros2 run tf2_ros tf2_echo map base_link.
Robot doesn't stop at goal
: The config uses SimpleGoalChecker with 0.5m/0.5rad tolerance. If using StoppedGoalChecker instead, the robot may never satisfy the "fully stopped" condition and loop forever.
Movement issues¶
"Sensor origin is out of map bounds"
: Harmless warning. The lidar mount (~2m) is above the costmap's max_obstacle_height (1.9m). The costmap still works.
Process management¶
Duplicate node errors on relaunch
: Old Nav2 processes linger after pkill. Wait ~30 seconds for DDS discovery to clean up stale entries before relaunching. Check with ros2 node list.
Stopping Nav2
Killing a tmux session running docker exec does not kill the processes inside the container. Always stop from inside:
docker exec isaac_ros_dev_container bash -c \
"pkill -9 -f controller_server; pkill -9 -f smoother_server; \
pkill -9 -f planner_server; pkill -9 -f behavior_server; \
pkill -9 -f bt_navigator; pkill -9 -f velocity_smoother; \
pkill -9 -f waypoint_follower; pkill -9 -f lifecycle_manager; \
pkill -9 -f nav2_carter_launch; pkill -9 -f launch_nav2"
Then wait ~30 seconds before relaunching so DDS clears the stale node registrations.
Plugin naming in Jazzy
Nav2 Jazzy requires :: separator for plugin names (e.g., nav2_smac_planner::SmacPlannerHybrid). The older / format (e.g., nav2_smac_planner/SmacPlannerHybrid) causes "class does not exist" errors. Also, plugin_lib_names should NOT be listed in bt_navigator config — built-in plugins are registered automatically.