Skip to content

Cloud VPN — Operations & Recreation Runbook

Classification: CONFIDENTIAL — Internal Use Only Document: infrastructure/runbooks/vpn-operations.md · v1.0 · 2026-03-17 · GPUS-IT


VPN overview

Parameter Value
Type Classic VPN (IKEv2)
GCP gateway gpus-vpn-gateway (130.211.194.72)
WDC peer Meraki MX100 (38.140.146.68)
GCP VPC 172.16.0.0/24
VPC connector subnet 10.8.0.0/28 (Cloud Run → WDC SSH)
WDC production 192.168.120.0/23
WDC management 192.168.124.0/24
PSK Stored in Secret Manager: vpn-psk-wdc

⚠️ Critical — traffic selectors

The VPN tunnel localTrafficSelector MUST include both: - 172.16.0.0/24 — GCP VPC subnet - 10.8.0.0/28 — VPC Serverless Connector subnet

Without 10.8.0.0/28, Cloud Run services cannot SSH to WDC servers even though the VPN tunnel shows ESTABLISHED. This was the root cause of IR-001 (2026-03-17).

The Meraki MX100 VPN policy already includes 10.8.0.0/28 on the WDC side — no Meraki changes needed when recreating the tunnel.


Check VPN status

gcloud compute vpn-tunnels describe gpus-vpn-tunnel-wdc \
  --region=us-central1 --project=gpus-infra \
  --format="value(status, detailedStatus, localTrafficSelector, remoteTrafficSelector)"

Expected output:

ESTABLISHED    Tunnel is up and running.
172.16.0.0/24;10.8.0.0/28    192.168.120.0/23;192.168.124.0/24


Recreate VPN tunnel

Use this when the tunnel is stuck in FIRST_HANDSHAKE or needs traffic selector updates.

# Step 1 — Delete existing tunnel (~30 sec VPN outage)
gcloud compute vpn-tunnels delete gpus-vpn-tunnel-wdc \
  --region=us-central1 --project=gpus-infra --quiet

# Step 2 — Recreate with correct traffic selectors
gcloud compute vpn-tunnels create gpus-vpn-tunnel-wdc \
  --peer-address=38.140.146.68 \
  --ike-version=2 \
  --local-traffic-selector=172.16.0.0/24,10.8.0.0/28 \
  --remote-traffic-selector=192.168.120.0/23,192.168.124.0/24 \
  --target-vpn-gateway=gpus-vpn-gateway \
  --region=us-central1 \
  --project=gpus-infra \
  --shared-secret=$(gcloud secrets versions access latest \
      --secret=vpn-psk-wdc --project=gpus-infra)

# Step 3 — Poll until ESTABLISHED
for i in $(seq 1 12); do
  STATUS=$(gcloud compute vpn-tunnels describe gpus-vpn-tunnel-wdc \
    --region=us-central1 --project=gpus-infra \
    --format="value(status)" 2>/dev/null)
  echo "$(date +%H:%M:%S)${STATUS}"
  [[ "${STATUS}" == "ESTABLISHED" ]] && break
  sleep 15
done

Verify connectivity after recreation

# 1. Tunnel status
gcloud compute vpn-tunnels describe gpus-vpn-tunnel-wdc \
  --region=us-central1 --project=gpus-infra \
  --format="value(status)"

# 2. Routes still intact
gcloud compute routes list \
  --project=gpus-infra \
  --filter="network=gpus-vpc" \
  --format="table(name,destRange,nextHopVpnTunnel)"

# 3. Backend can reach WDC servers
gcloud run services logs read gpus-status-backend \
  --region=us-central1 --project=gpus-infra --limit=10

# 4. From Mac — confirm WDC reachable
ping -c 2 192.168.120.1

Store PSK in Secret Manager

# Store PSK (one-time setup)
echo -n "YOUR_PSK_HERE" | gcloud secrets create vpn-psk-wdc \
  --data-file=- \
  --project=gpus-infra \
  --replication-policy=automatic

# Retrieve PSK
gcloud secrets versions access latest \
  --secret=vpn-psk-wdc --project=gpus-infra

Meraki MX100 VPN policy

The Meraki peer is configured under Security & SD-WAN → Site-to-site VPN as GCP-GPUS-Infra:

Parameter Value
IKE version IKEv2
IPsec policies Custom
Public IP 130.211.194.72
IPsec subnets 172.16.0.0/24, 10.8.0.0/28

No Meraki changes are needed when recreating the GCP tunnel — the Meraki policy is already correct and persistent.


Incident history

Date Incident Root cause Fix
2026-03-17 IR-001 — Cloud Run cannot SSH to WDC localTrafficSelector missing 10.8.0.0/28 Recreated tunnel with correct selectors