Skip to content

DNS Failover Procedure

Classification: CONFIDENTIAL — Internal Use Only


Scenario: SKY is down

RAIN automatically assumes all DNS queries when SKY is unreachable. No manual intervention is required for the failover itself.

Recovery steps (restore SKY):

  1. Diagnose SKY — check VM console in ESXi
  2. If OS is recoverable, SSH in and restart BIND: sudo systemctl restart named
  3. If OS is unrecoverable, restore from ESXi snapshot (< 5 min) or rebuild from config backup (< 30 min)
  4. After SKY is restored, verify zone transfer from SKY to RAIN: rndc notify wdc.us.gl3
  5. Confirm both servers are authoritative: dig @192.168.120.1 sky.wdc.us.gl3 && dig @192.168.120.2 sky.wdc.us.gl3
  6. Run Post-Change Checklist

Scenario: Both SKY and RAIN are down

  1. Add critical hosts to /etc/hosts on affected workstations as emergency fallback
  2. Restore SKY from snapshot first (primary); RAIN will re-sync automatically on startup
  3. Verify zone replication before removing /etc/hosts overrides

Verify DNS Health

# Forward resolution (both servers)
for s in 192.168.120.1 192.168.120.2; do
    echo "=== $s ==="
    dig @$s sky.wdc.us.gl3 +short
    dig @$s rain.wdc.us.gl3 +short
done

# DNSSEC validation
dig @192.168.120.1 wdc.us.gl3 DNSKEY +dnssec

Dns Failover · v1.1 · 2026-03-14 · GPUS-IT · Classification: CONFIDENTIAL — Internal Use Only