Server Rebuild¶
Classification: CONFIDENTIAL — Internal Use Only
This runbook covers rebuilding any of the four WDC servers from a config backup when a snapshot restore is not possible.
Step 1 — Restore from ESXi Snapshot (preferred)¶
- Open VMware ESXi console
- Right-click the VM → Snapshots → Revert to most recent daily snapshot
- Power on and verify services
- Estimated time: < 5 minutes
Step 2 — Rebuild from Config Backup¶
If a snapshot restore is not possible, retrieve the latest backup archive from /backup/ on the affected server (or from gs://gpus-infra-backups-wdc).
SKY or RAIN¶
Backup archive contains: named.conf, zone files, DNSSEC keys, dhcpd.conf, lease database, AIDE database.
# Restore config files
tar -xzf /backup/dns-dhcp/dns-dhcp-backup-YYYY-MM-DD.tar.gz -C /
# Re-apply ownership and permissions
chown -R named:named /var/named/
chmod 640 /var/named/keys/*.private
# Restart services
systemctl restart named dhcpd
# Reload zone
rndc reload
SUN¶
Backup archive contains: prometheus.yml, grafana.ini, Webmin config, AIDE database.
tar -xzf /backup/monitoring/mon-backup-YYYY-MM-DD.tar.gz -C /
systemctl restart prometheus grafana-server
WIND¶
Backup archive contains: Logstash pipeline, elasticsearch.yml, kibana.yml, Kibana saved objects, AIDE database.
tar -xzf /backup/logging/log-backup-YYYY-MM-DD.tar.gz -C /
systemctl restart logstash elasticsearch kibana
Step 3 — Post-Rebuild¶
Run the Post-Change Checklist immediately after any rebuild.
Server Rebuild · v1.1 · 2026-03-14 · GPUS-IT · Classification: CONFIDENTIAL — Internal Use Only