Skip to content

Server Rebuild

Classification: CONFIDENTIAL — Internal Use Only

This runbook covers rebuilding any of the four WDC servers from a config backup when a snapshot restore is not possible.


Step 1 — Restore from ESXi Snapshot (preferred)

  1. Open VMware ESXi console
  2. Right-click the VM → Snapshots → Revert to most recent daily snapshot
  3. Power on and verify services
  4. Estimated time: < 5 minutes

Step 2 — Rebuild from Config Backup

If a snapshot restore is not possible, retrieve the latest backup archive from /backup/ on the affected server (or from gs://gpus-infra-backups-wdc).

SKY or RAIN

Backup archive contains: named.conf, zone files, DNSSEC keys, dhcpd.conf, lease database, AIDE database.

# Restore config files
tar -xzf /backup/dns-dhcp/dns-dhcp-backup-YYYY-MM-DD.tar.gz -C /

# Re-apply ownership and permissions
chown -R named:named /var/named/
chmod 640 /var/named/keys/*.private

# Restart services
systemctl restart named dhcpd

# Reload zone
rndc reload

SUN

Backup archive contains: prometheus.yml, grafana.ini, Webmin config, AIDE database.

tar -xzf /backup/monitoring/mon-backup-YYYY-MM-DD.tar.gz -C /
systemctl restart prometheus grafana-server

WIND

Backup archive contains: Logstash pipeline, elasticsearch.yml, kibana.yml, Kibana saved objects, AIDE database.

tar -xzf /backup/logging/log-backup-YYYY-MM-DD.tar.gz -C /
systemctl restart logstash elasticsearch kibana

Step 3 — Post-Rebuild

Run the Post-Change Checklist immediately after any rebuild.


Server Rebuild · v1.1 · 2026-03-14 · GPUS-IT · Classification: CONFIDENTIAL — Internal Use Only