Skip to content

DNS & DHCP — SKY / RAIN

Classification: CONFIDENTIAL — Internal Use Only

SKY (primary) and RAIN (secondary) form a synchronized high-availability pair providing DNS and DHCP services for the entire WDC site. All changes must be applied to both servers simultaneously.


DNS Architecture

SKY is the authoritative nameserver for wdc.us.gl3. RAIN maintains a live slave copy via AXFR/IXFR zone transfers and automatically handles all queries if SKY is offline.

Zone Type Destination
wdc.us.gl3 Authoritative SKY (slave on RAIN)
cloud.us.gl3 Forward 10.1.96.2
us.gl3 Forward 10.1.96.2
. (root) Forward 1.1.1.1 / 8.8.8.8
Reverse 192.168.120/23 Authoritative SKY (slave on RAIN)

DNSSEC

Zones are signed with a ZSK and KSK. After any zone file change, re-sign and reload:

# Re-sign zone
dnssec-signzone -A -3 $(head -c 6 /dev/random | od -An -tx1 | tr -d ' \n') \
    -N INCREMENT -o wdc.us.gl3 -t /var/named/wdc.us.gl3.db

# Reload BIND (both servers)
rndc reload

Query ACLs — WDC + GCP (2026-04-11)

The allow-query and allow-recursion ACLs on both SKY and RAIN include the GCP cloud VPC subnet 172.16.0.0/24 alongside the on-prem subnets. This is required so that OAK, MAPLE, and CEDAR can resolve wdc.us.gl3 and cloud.us.gl3 records over the site-to-site VPN tunnel.

Before this change, BIND silently dropped cloud-side recursive lookups with query (cache) denied errors in /var/log/messages, which caused cloudadmin SSH short-hostname resolution and the SOC backend's per-server SSH collectors to fall back to IP-based targeting. The symptom was visible as Query cache denied for 172.16.0.x entries in journalctl -u named, and the fix was applied to the options { } block in /etc/named.conf on both SKY and RAIN:

// /etc/named.conf — options block
acl "trusted" {
    localhost;
    192.168.120.0/23;    // WDC on-prem (both /24s)
    192.168.121.0/24;    // WDC dynamic workstations
    192.168.122.0/24;    // WDC dynamic devices
    172.16.0.0/24;       // GCP cloud VPC (added 2026-04-11)
    10.8.0.0/28;         // GCP Cloud Run VPC connector
};

options {
    directory       "/var/named";
    allow-query     { trusted; };
    allow-recursion { trusted; };
    allow-transfer  { key "rndc-key"; 192.168.120.2; };   // RAIN on SKY, SKY on RAIN
    dnssec-validation auto;
    // ...
};

After applying the change, reload BIND on both servers and verify with a query sourced from the GCP VPC:

# On SKY and RAIN
sudo rndc reload

# From OAK (172.16.0.10), over the VPN tunnel
dig @192.168.120.1 sky.wdc.us.gl3 A +short
dig @192.168.120.1 oak.cloud.us   A +short    # reverse direction via forward zone

# Confirm no more 'denied' entries
sudo journalctl -u named | grep -i denied | tail -5

Apply to BOTH servers

This is a paired change. Apply the ACL update to SKY first, verify, then RAIN. If RAIN is left with the old ACL, a SKY failover will immediately break cloud-side name resolution and the SOC backend will start logging SSH errors for OAK/MAPLE/CEDAR within seconds.


DHCP Architecture

DHCP failover runs over TCP port 647. SKY is PRIMARY, RAIN is SECONDARY. Lease state synchronizes automatically; failover detection takes under 30 seconds.

Pool Range Purpose
192.168.121.0/24 .0 – .200 Dynamic workstations
192.168.122.0/24 .101 – .240 Dynamic devices

Static reservations are managed in dhcpd.conf on both SKY and RAIN and tracked in wdc-hostregistry.csv.


Post-Change Checklist

Mandatory after every config change

Never skip this checklist on SKY or RAIN.

  1. Update AIDE baseline:
    sudo aide --update && sudo mv /var/lib/aide/aide.db.new.gz /var/lib/aide/aide.db.gz
    
  2. Log change to /var/log/asset-inventory.log
  3. Re-sign DNSSEC if zone files changed (see above)

Key File Locations

File Path
BIND main config /etc/named.conf
Zone file (forward) /var/named/wdc.us.gl3.db
Zone file (reverse) /var/named/192.168.120.0.rev
DHCP config /etc/dhcp/dhcpd.conf
DHCP leases /var/lib/dhcpd/dhcpd.leases
DNSSEC keys /var/named/keys/
Host registry wdc-hostregistry.csv

See the full operational guide: sky-rain-dns-dhcp-infrastructure.md


Dns Dhcp · v1.1 · 2026-03-14 · GPUS-IT · Classification: CONFIDENTIAL — Internal Use Only