DNS & DHCP — SKY / RAIN¶
Classification: CONFIDENTIAL — Internal Use Only
SKY (primary) and RAIN (secondary) form a synchronized high-availability pair providing DNS and DHCP services for the entire WDC site. All changes must be applied to both servers simultaneously.
DNS Architecture¶
SKY is the authoritative nameserver for wdc.us.gl3. RAIN maintains a live slave copy via AXFR/IXFR zone transfers and automatically handles all queries if SKY is offline.
| Zone | Type | Destination |
|---|---|---|
wdc.us.gl3 |
Authoritative | SKY (slave on RAIN) |
cloud.us.gl3 |
Forward | 10.1.96.2 |
us.gl3 |
Forward | 10.1.96.2 |
. (root) |
Forward | 1.1.1.1 / 8.8.8.8 |
| Reverse 192.168.120/23 | Authoritative | SKY (slave on RAIN) |
DNSSEC¶
Zones are signed with a ZSK and KSK. After any zone file change, re-sign and reload:
# Re-sign zone
dnssec-signzone -A -3 $(head -c 6 /dev/random | od -An -tx1 | tr -d ' \n') \
-N INCREMENT -o wdc.us.gl3 -t /var/named/wdc.us.gl3.db
# Reload BIND (both servers)
rndc reload
Query ACLs — WDC + GCP (2026-04-11)¶
The allow-query and allow-recursion ACLs on both SKY and RAIN include the GCP cloud VPC subnet 172.16.0.0/24 alongside the on-prem subnets. This is required so that OAK, MAPLE, and CEDAR can resolve wdc.us.gl3 and cloud.us.gl3 records over the site-to-site VPN tunnel.
Before this change, BIND silently dropped cloud-side recursive lookups with query (cache) denied errors in /var/log/messages, which caused cloudadmin SSH short-hostname resolution and the SOC backend's per-server SSH collectors to fall back to IP-based targeting. The symptom was visible as Query cache denied for 172.16.0.x entries in journalctl -u named, and the fix was applied to the options { } block in /etc/named.conf on both SKY and RAIN:
// /etc/named.conf — options block
acl "trusted" {
localhost;
192.168.120.0/23; // WDC on-prem (both /24s)
192.168.121.0/24; // WDC dynamic workstations
192.168.122.0/24; // WDC dynamic devices
172.16.0.0/24; // GCP cloud VPC (added 2026-04-11)
10.8.0.0/28; // GCP Cloud Run VPC connector
};
options {
directory "/var/named";
allow-query { trusted; };
allow-recursion { trusted; };
allow-transfer { key "rndc-key"; 192.168.120.2; }; // RAIN on SKY, SKY on RAIN
dnssec-validation auto;
// ...
};
After applying the change, reload BIND on both servers and verify with a query sourced from the GCP VPC:
# On SKY and RAIN
sudo rndc reload
# From OAK (172.16.0.10), over the VPN tunnel
dig @192.168.120.1 sky.wdc.us.gl3 A +short
dig @192.168.120.1 oak.cloud.us A +short # reverse direction via forward zone
# Confirm no more 'denied' entries
sudo journalctl -u named | grep -i denied | tail -5
Apply to BOTH servers
This is a paired change. Apply the ACL update to SKY first, verify, then RAIN. If RAIN is left with the old ACL, a SKY failover will immediately break cloud-side name resolution and the SOC backend will start logging SSH errors for OAK/MAPLE/CEDAR within seconds.
DHCP Architecture¶
DHCP failover runs over TCP port 647. SKY is PRIMARY, RAIN is SECONDARY. Lease state synchronizes automatically; failover detection takes under 30 seconds.
| Pool | Range | Purpose |
|---|---|---|
| 192.168.121.0/24 | .0 – .200 | Dynamic workstations |
| 192.168.122.0/24 | .101 – .240 | Dynamic devices |
Static reservations are managed in dhcpd.conf on both SKY and RAIN and tracked in wdc-hostregistry.csv.
Post-Change Checklist¶
Mandatory after every config change
Never skip this checklist on SKY or RAIN.
- Update AIDE baseline:
- Log change to
/var/log/asset-inventory.log - Re-sign DNSSEC if zone files changed (see above)
Key File Locations¶
| File | Path |
|---|---|
| BIND main config | /etc/named.conf |
| Zone file (forward) | /var/named/wdc.us.gl3.db |
| Zone file (reverse) | /var/named/192.168.120.0.rev |
| DHCP config | /etc/dhcp/dhcpd.conf |
| DHCP leases | /var/lib/dhcpd/dhcpd.leases |
| DNSSEC keys | /var/named/keys/ |
| Host registry | wdc-hostregistry.csv |
See the full operational guide: sky-rain-dns-dhcp-infrastructure.md
Dns Dhcp · v1.1 · 2026-03-14 · GPUS-IT · Classification: CONFIDENTIAL — Internal Use Only