Cloud VPN — Operations & Recreation Runbook¶
Classification: CONFIDENTIAL — Internal Use Only Document:
infrastructure/runbooks/vpn-operations.md· v1.0 · 2026-03-17 · GPUS-IT
VPN overview¶
| Parameter | Value |
|---|---|
| Type | Classic VPN (IKEv2) |
| GCP gateway | gpus-vpn-gateway (130.211.194.72) |
| WDC peer | Meraki MX100 (38.140.146.68) |
| GCP VPC | 172.16.0.0/24 |
| VPC connector subnet | 10.8.0.0/28 (Cloud Run → WDC SSH) |
| WDC production | 192.168.120.0/23 |
| WDC management | 192.168.124.0/24 |
| PSK | Stored in Secret Manager: vpn-psk-wdc |
⚠️ Critical — traffic selectors¶
The VPN tunnel localTrafficSelector MUST include both:
- 172.16.0.0/24 — GCP VPC subnet
- 10.8.0.0/28 — VPC Serverless Connector subnet
Without 10.8.0.0/28, Cloud Run services cannot SSH to WDC servers even though the VPN tunnel shows ESTABLISHED. This was the root cause of IR-001 (2026-03-17).
The Meraki MX100 VPN policy already includes 10.8.0.0/28 on the WDC side — no Meraki changes needed when recreating the tunnel.
Check VPN status¶
gcloud compute vpn-tunnels describe gpus-vpn-tunnel-wdc \
--region=us-central1 --project=gpus-infra \
--format="value(status, detailedStatus, localTrafficSelector, remoteTrafficSelector)"
Expected output:
Recreate VPN tunnel¶
Use this when the tunnel is stuck in FIRST_HANDSHAKE or needs traffic selector updates.
# Step 1 — Delete existing tunnel (~30 sec VPN outage)
gcloud compute vpn-tunnels delete gpus-vpn-tunnel-wdc \
--region=us-central1 --project=gpus-infra --quiet
# Step 2 — Recreate with correct traffic selectors
gcloud compute vpn-tunnels create gpus-vpn-tunnel-wdc \
--peer-address=38.140.146.68 \
--ike-version=2 \
--local-traffic-selector=172.16.0.0/24,10.8.0.0/28 \
--remote-traffic-selector=192.168.120.0/23,192.168.124.0/24 \
--target-vpn-gateway=gpus-vpn-gateway \
--region=us-central1 \
--project=gpus-infra \
--shared-secret=$(gcloud secrets versions access latest \
--secret=vpn-psk-wdc --project=gpus-infra)
# Step 3 — Poll until ESTABLISHED
for i in $(seq 1 12); do
STATUS=$(gcloud compute vpn-tunnels describe gpus-vpn-tunnel-wdc \
--region=us-central1 --project=gpus-infra \
--format="value(status)" 2>/dev/null)
echo "$(date +%H:%M:%S) — ${STATUS}"
[[ "${STATUS}" == "ESTABLISHED" ]] && break
sleep 15
done
Verify connectivity after recreation¶
# 1. Tunnel status
gcloud compute vpn-tunnels describe gpus-vpn-tunnel-wdc \
--region=us-central1 --project=gpus-infra \
--format="value(status)"
# 2. Routes still intact
gcloud compute routes list \
--project=gpus-infra \
--filter="network=gpus-vpc" \
--format="table(name,destRange,nextHopVpnTunnel)"
# 3. Backend can reach WDC servers
gcloud run services logs read gpus-status-backend \
--region=us-central1 --project=gpus-infra --limit=10
# 4. From Mac — confirm WDC reachable
ping -c 2 192.168.120.1
Store PSK in Secret Manager¶
# Store PSK (one-time setup)
echo -n "YOUR_PSK_HERE" | gcloud secrets create vpn-psk-wdc \
--data-file=- \
--project=gpus-infra \
--replication-policy=automatic
# Retrieve PSK
gcloud secrets versions access latest \
--secret=vpn-psk-wdc --project=gpus-infra
Meraki MX100 VPN policy¶
The Meraki peer is configured under Security & SD-WAN → Site-to-site VPN as GCP-GPUS-Infra:
| Parameter | Value |
|---|---|
| IKE version | IKEv2 |
| IPsec policies | Custom |
| Public IP | 130.211.194.72 |
| IPsec subnets | 172.16.0.0/24, 10.8.0.0/28 |
No Meraki changes are needed when recreating the GCP tunnel — the Meraki policy is already correct and persistent.
Incident history¶
| Date | Incident | Root cause | Fix |
|---|---|---|---|
| 2026-03-17 | IR-001 — Cloud Run cannot SSH to WDC | localTrafficSelector missing 10.8.0.0/28 |
Recreated tunnel with correct selectors |