Skip to content

Snapshot & Backup Schedule — WDC On-Prem

Classification: CONFIDENTIAL — Internal Use Only Document: architecture/wdc/backup-snapshots/schedule.md · v1.0 · 2026-05-12 · GPUS-IT


3-2-1 Design

3 copies of every protected workload — production VM, NAS snapshot, GCS offsite. 2 different media — block storage (NAS) and object storage (GCS). 1 copy offsite — Google Cloud Storage bucket in a separate region from the WDC office.

1. Protected Assets

Asset Host Tier RPO RTO
ocean.wdc.us.gl3 (KACE SMA) water Tier 1 — Critical 4 h 4 h
Future: sky.wdc.us.gl3 fire Tier 1 4 h 4 h
Future: rain.wdc.us.gl3 fire Tier 2 12 h 8 h
Future: wind.wdc.us.gl3 fire Tier 2 12 h 8 h
Future: sun.wdc.us.gl3 fire Tier 3 24 h 24 h
ESXi host config (water, future fire/flower) n/a Tier 1 24 h 2 h (reapply config)
NAS volumes hosting VMDKs NAS Tier 1 4 h 4 h

Tiering rules

  • Tier 1: revenue, security tooling, or staff productivity blocking.
  • Tier 2: important but tolerates a workday of data loss.
  • Tier 3: dev / utility / easy to rebuild from IaC.

2. Snapshot Schedule (VM-level, vSphere)

Snapshots are short-lived recovery points — they are not backups. Maximum age is enforced.

Workload Trigger Frequency Retention Quiesced? Memory?
Tier 1 VMs Scheduled Every 4 h 24 h (6 snaps max) Yes No
Tier 1 VMs Pre-change Manual, before any patch / config change 24 h Yes Yes
Tier 2 VMs Scheduled Every 12 h 48 h Yes No
Tier 3 VMs On demand only 72 h max Yes No
Any VM Snapshot age guard Auto-delete at 72 h

Snapshot hygiene

vSphere snapshots degrade performance the longer they live and the larger their delta. A monitor job alerts SOC when any snapshot exceeds 72 h or 50 GB delta.

3. Backup Schedule (Image-level, Veeam)

Backups are taken with Veeam Backup & Replication writing to the on-prem NAS repository, then copied offsite to GCS.

Job Source Frequency Local Retention (NAS) Offsite Retention (GCS) Encrypted
WDC-Tier1-Daily All Tier 1 VMs Daily 22:00 ET 14 daily 30 daily AES-256
WDC-Tier1-Weekly All Tier 1 VMs Sun 23:00 ET 4 weekly 12 weekly AES-256
WDC-Tier1-Monthly All Tier 1 VMs 1st of month 3 monthly 12 monthly AES-256
WDC-Tier1-Annual All Tier 1 VMs Jan 1 n/a 7 annual AES-256
WDC-Tier2-Daily All Tier 2 VMs Daily 23:00 ET 7 daily 14 daily AES-256
WDC-Tier2-Weekly All Tier 2 VMs Sun 23:30 ET 4 weekly 8 weekly AES-256
WDC-Tier3-Weekly All Tier 3 VMs Sun 00:30 ET 4 weekly 4 weekly AES-256
WDC-ESXi-Config Host profiles + IaC export Daily 21:00 ET 14 daily 30 daily AES-256

3.1 Storage Targets

Target Type Path Capacity Notes
Primary (on-prem) NAS NFS share nas-wdc-01:/backups/wdc 10 TB usable Immutable / object-lock enabled
Offsite GCS bucket gs://gpus-wdc-backups (Coldline → Archive lifecycle) unlimited Bucket lock + retention policy (30 d minimum)

3.2 Encryption & Key Management

  • All backup jobs use AES-256 with a Veeam-managed encryption key.
  • Master key escrowed in the IT password vault; two-person rule for retrieval.
  • GCS bucket uses CMEK (customer-managed encryption keys) from Google KMS, key rotation 90 d.

4. Restore Testing

Untested backups are not backups. Restore tests are mandatory and tracked.

Test Frequency Method Owner Pass criteria
File-level restore (random VM) Weekly Veeam Instant Restore to sandbox IT Ops File matches, hash verified
Full-VM restore (Tier 1) Monthly Restore to isolated dr-sandbox network IT Ops + SOC VM boots, services up, no malware indicators
Bare-metal ESXi rebuild + config restore Quarterly Rebuild lab host from runbook Cyber Sec Host profile compliant, telemetry flowing
Full DR drill (cross-region) Annual Restore Tier 1 from GCS into GCP cold-standby project All RTO/RPO met for every Tier 1 asset

Each test is logged in the DR test register.

5. Monitoring

  • Veeam → Wazuh integration via syslog: every job emits start, success, failure, and warning events.
  • Failure of any Tier 1 job → immediate page to on-call.
  • Two consecutive failures of any tier → high-severity ticket auto-created in KACE.
  • Daily summary email to it-ops@greenpeace.org.
  • Dashboard tile on soc.greenpeace.us shows last-good-backup age for every protected VM.

6. Compliance Mapping

Control / Framework Section How addressed
CIS Controls v8 Control 11 — Data Recovery Tiered schedule, offsite copy, tested restores
NIST CSF 2.0 PR.DS-11 (data backup), RC.RP (recovery planning) Documented schedule + DR plan
PCI-DSS v4.0 Req. 9.4.1, 12.10.1 Backup media security, IR/DR alignment
Greenpeace IRP §8 (Recovery) Backups feed the IRP recovery phase

7. Change Log

Date Change By
2026-05-12 Initial schedule authored for WDC cluster R. Chhetry