Skip to content

WDC On-Premises Cluster

Classification: CONFIDENTIAL — Internal Use Only

The WDC on-premises cluster consists of four Rocky Linux 8 virtual machines running on VMware ESXi 6.7. Each server has a dedicated production NIC (192.168.120.0/23) and a separate management NIC (192.168.124.0/24).


Server Inventory

Server Hostname Production IP Management IP Primary Role
SKY sky.wdc.us.gl3 192.168.120.1 192.168.124.1 Primary DNS + DHCP
RAIN rain.wdc.us.gl3 192.168.120.2 192.168.124.2 Secondary DNS + DHCP
SUN sun.wdc.us.gl3 192.168.120.3 192.168.124.3 Prometheus + Grafana
WIND wind.wdc.us.gl3 192.168.120.4 192.168.124.4 Elasticsearch + Kibana

Hardware Specification (all servers identical)

Parameter Value
Hypervisor VMware ESXi 6.7
OS Rocky Linux 8.10
vCPU 4
RAM 8 GB
OS Disk (sda) 200 GB thin-provisioned VMDK
Data Disk (sdb) 300 GB thin-provisioned VMDK
NICs 2× VMXNET3 (production + management)

Service Overview

SKY and RAIN — DNS / DHCP Pair

SKY and RAIN operate as a synchronized high-availability pair. SKY is the authoritative primary for the wdc.us.gl3 zone; RAIN holds a live slave copy and auto-assumes all queries if SKY goes offline. DHCP failover runs over TCP port 647, with lease synchronization in under 30 seconds.

  • DNS zones: wdc.us.gl3, reverse zones for 192.168.120.0/23
  • Forward zones: cloud.us.gl3 → 10.1.96.2, . → 1.1.1.1 / 8.8.8.8
  • DNSSEC: Zone-signed with ZSK + KSK; RAIN validates signatures
  • DHCP pools: 192.168.121.0–121.200, 192.168.122.101–122.240

See DNS & DHCP for full configuration reference.

SUN — Monitoring

SUN runs Prometheus (port 9090) and Grafana (port 3000), scraping node_exporter and bind_exporter from all four servers every 15 seconds. Grafana dashboards cover OS metrics and DNS performance with P1–P4 alert thresholds.

See Monitoring.

WIND — Logging

WIND runs the full ELK stack: Logstash (port 5140) ingests rsyslog streams from SKY and RAIN, routes parsed events into Elasticsearch (port 9200), and surfaces them in Kibana (port 5601). Indices rotate daily with 90-day retention.

See Logging.


Recovery Objectives

Server RTO (snapshot restore) RTO (config rebuild) RPO
SKY < 5 min (RAIN auto-assumes) < 30 min 24 hours
RAIN < 30 min < 1 hour 24 hours
SUN < 30 min < 1 hour 24 hours
WIND < 30 min < 1 hour 24 hours

Critical dependency

DNS and DHCP services (SKY/RAIN) are fully independent of SUN and WIND. SUN or WIND failure results in loss of visibility, not loss of core network services.


Service Access (management network only)

Service URL Credentials
SKY Webmin https://192.168.124.1:10000 dnsadmin
RAIN Webmin https://192.168.124.2:10000 dnsadmin
SUN Webmin https://192.168.124.3:10000 monitadmin
WIND Webmin https://192.168.124.4:10000 monitadmin
Prometheus http://192.168.124.3:9090 network-restricted
Grafana http://192.168.124.3:3000 grafana_admin
Kibana http://192.168.124.4:5601 network-restricted

Wdc On Prem · v1.1 · 2026-03-14 · GPUS-IT · Classification: CONFIDENTIAL — Internal Use Only