Skip to content

NAT Troubleshooting

Two independent behaviors per NAT router:

Mapping (port assignment):

  • Endpoint-Independent Mapping (EIM): Same external port regardless of destination. MappingVariesByDestIP: false. Good for hole-punching.
  • Endpoint-Dependent Mapping (EDM): Different external port per destination. MappingVariesByDestIP: true. Hard to hole-punch — the other side can’t predict your port.

Filtering (who can send inbound):

  • Endpoint-Independent Filtering (EIF): Once a port opens via outbound traffic, ANY external host can send to it. Consumer routers (Bell Home Hub) typically do this.
  • Endpoint-Dependent Filtering (EDF): Only the original destination can send back. UniFi default. Blocks hole-punching because the remote peer differs from the STUN server that discovered the port.

tailscale netcheck only tests mapping, NOT filtering. Two routers can show identical netcheck results but behave completely differently for hole-punching.

  • ISP adds an extra NAT layer between router and internet
  • Router gets a private IP (e.g., 10.x.x.x) instead of a real public IP
  • Symptom: WAN IP on router differs from curl ifconfig.me
  • Port forwarding on router is useless — can’t control ISP’s NAT
  • Kingston workstation behind Bell CGNAT: router gets 10.130.37.119, internet sees

STUN (Session Traversal Utilities for NAT)

Section titled “STUN (Session Traversal Utilities for NAT)”
  • Server that reports your public IP:port as seen from outside
  • Tailscale runs STUN alongside every DERP server
  • tailscale netcheck queries STUN servers in all regions to discover NAT behavior
  • Discovered public IP:port shared with peers via Tailscale’s coordination server
  • Protocols that let LAN devices request port forwards from the router
  • Tailscale requests: “open my port and let anyone send to it”
  • Bypasses endpoint-dependent filtering by creating an explicit allow-any DNAT rule
  • UPnP is most common, NAT-PMP is Apple’s version, PCP is the newer standard
  • Mapping: EIM (MappingVariesByDestIP: false)
  • Filtering: EDF (endpoint-dependent) — strict by default
  • UPnP: Enabled (fixes the filtering issue)
  • PortMapping: UPnP, NAT-PMP, PCP (all three working)
  • Result: Direct connections work
  • Mapping: EIM
  • Filtering: EIF (endpoint-independent) — permissive
  • UPnP: Not needed (filtering already permissive)
  • Result: Direct connections work naturally
  • ISP: Virgin Plus (Bell flanker brand) — behind CGNAT (MappingVariesByDestIP: true)
  • UPnP on Kingston UniFi is useless — CGNAT sits above it
  • Cannot accept inbound connections — relies on hole-punching from the other side
  • 10G NIC (enp3s0) has a PPPoE config that previously got (possibly non-CGNAT)
  • CGNAT outbound throughput cap: ~260 Mbps to public internet. Hard packet drops above this rate (52% loss at 500 Mbps). Inbound has deeper buffers — asymmetry causes upload (434 Mbps) >> download (119 Mbps) through Tailscale.
  • Ookla: 896 down / 342 up to Bell servers (stays inside Bell’s network, bypasses CGNAT). Cross-network to Rogers: 99 down / 212 up — confirms CGNAT bottleneck.
Terminal window
tailscale ping <hostname>
# Look for: "via <IP>" = direct, "via DERP(xxx)" = relayed, "via peer-relay(...)" = peer relay
Terminal window
tailscale netcheck
# Key fields: MappingVariesByDestIP, PortMapping
Terminal window
# macOS:
/Applications/Tailscale.app/Contents/MacOS/Tailscale debug portmap
# Restart Tailscale to re-probe:
# Mac: tailscale down && tailscale up
# Linux (safe over SSH): sudo systemctl restart tailscaled
Terminal window
# macOS:
/Applications/Tailscale.app/Contents/MacOS/Tailscale debug netmap # Full netmap with all peer endpoints
/Applications/Tailscale.app/Contents/MacOS/Tailscale debug peer-endpoint-changes <hostname> # Endpoint change history
/Applications/Tailscale.app/Contents/MacOS/Tailscale debug restun # Force STUN re-discovery
Terminal window
tailscale status
tailscale status --json # Full details with endpoints
Terminal window
ssh workstation "tailscale netcheck"
ssh workstation "curl -s ifconfig.me" # Should differ from UniFi WAN IP if CGNAT

No direct connection from Toronto UniFi to Kingston

Section titled “No direct connection from Toronto UniFi to Kingston”
  • Root cause: UniFi’s endpoint-dependent NAT filtering drops hole-punch packets
  • Fix: Enable UPnP + NAT-PMP on UniFi Default network (Settings -> Networks -> Default -> UPnP)
  • Must restart Tailscale after enabling UPnP so it re-probes port mappings
    • Mac: tailscale down && tailscale up
    • Linux (over SSH): sudo systemctl restart tailscaled (safe — doesn’t break SSH)
  • Tailscale error: “tagged nodes cannot be untagged without reauth”
  • Workaround: Modify the ACL grants section instead of changing device tags
  • Full fix: tailscale up --force-reauth (will re-register the device)
  • Normal with CGNAT — Bell doesn’t forward ICMP through their NAT
  • Not a firewall issue on the workstation

Workstation PPPoE on 10G NIC causes stale endpoints

Section titled “Workstation PPPoE on 10G NIC causes stale endpoints”
  • Config exists at /etc/ppp/peers/dsl-provider for enp3s0.35 (VLAN 35)
  • PPPoE user: <REDACTED>
  • Keep PPPoE off — when active, Tailscale advertises extra endpoints from the 10G NIC (different CGNAT IP, IPv6, PPPoE internal IP). Stale endpoints persist in the coordination server after PPPoE shuts down, breaking hole-punching.
  • When tested: got CGNAT IPv4 (10.130.54.202) and real IPv6 ()
  • To clear stale endpoints: sudo systemctl restart tailscaled on workstation
  • Bell Canada does NOT support IPv6 on residential Fibe Internet (DSL or FTTH)
  • Only wireless/mobility (cellular) has IPv6
  • DHCPv6 prefix delegation (/60 or /56) won’t work — no prefix delegated
  • Kingston 10G PPPoE (Virgin Mobile credentials) DID get IPv6 — possibly different tier
  • Leave IPv6 disabled on Toronto UniFi

Two common causes:

1. Grants missing from ACL

  • Symptom: tailscale status --json shows tsrelay with Capabilities: [] and CapMap: null
  • Root cause: MCP manage_acl tool silently drops the grants field — use the Tailscale API directly.
  • Fix: Update ACL via API with both grants:
    1. "app": {"tailscale.com/cap/relay": []} — enables relay capability
    2. "ip": ["*"] — allows IP access to the relay node

2. Stale static endpoint on tsrelay

  • Symptom: tailscale ping shows DERP instead of peer-relay. tsrelay shows Active: False, InEngine: False, LastHandshake: 0001-01-01 from client perspective.
  • Root cause: --relay-server-static-endpoints pointed to a stale Bell PPPoE IP. Clients try the dead address. Bell PPPoE IPs are dynamic.
  • How it happened: Systemd override (/etc/systemd/system/tailscaled.service.d/relay.conf) re-applied the stale IP on every tailscaled restart via ExecStartPost.
  • Fix: Clear the static endpoint and fix the systemd override:
    Terminal window
    sudo tailscale set --relay-server-static-endpoints=''
    # Then fix /etc/systemd/system/tailscaled.service.d/relay.conf to only set --relay-server-port=40000
    sudo systemctl daemon-reload
  • Never use static endpoints — Bell PPPoE IP is dynamic. STUN + UPnP on Toronto UniFi discovers the correct public IP automatically.
  • After clearing, restart tailscaled to force re-advertisement: sudo systemctl restart tailscaled

Verification: tailscale ping <host> should show via peer-relay(IP:PORT:vni:N) instead of via DERP(tor)

Mobile hotspot has symmetric NAT (no direct connections)

Section titled “Mobile hotspot has symmetric NAT (no direct connections)”
  • Symptom: tailscale netcheck shows MappingVariesByDestIP: true and PortMapping: (none) on phone hotspot
  • Root cause: Carrier CGNAT on mobile hotspots uses Endpoint-Dependent Mapping (symmetric NAT). No UPnP, so direct hole-punching is impossible when the remote side also has restrictive NAT.
  • Result: Direct connections fail to all peers. Peer relay still works (zmac to tsrelay via DERP or direct, then to workstation). DERP works as fallback.
  • Tip: Coffee shop Wi-Fi often has more permissive NAT than carrier hotspots — test with tailscale netcheck and look for MappingVariesByDestIP: false

Never run tailscale down over a Tailscale SSH session

Section titled “Never run tailscale down over a Tailscale SSH session”
  • tailscale down kills the connection immediately
  • Over a Tailscale SSH session, the shell dies before tailscale up can execute
  • --accept-risk=lose-ssh warns but doesn’t prevent it
  • Use sudo systemctl restart tailscaled instead — it restarts the daemon and auto-reconnects
  • Backup access: Teleport VPN to 192.168.2.239 if Tailscale is stuck down