Snapshot & Backup Schedule — WDC On-Prem¶

Classification: CONFIDENTIAL — Internal Use Only Document: architecture/wdc/backup-snapshots/schedule.md · v1.0 · 2026-05-12 · GPUS-IT

3-2-1 Design

3 copies of every protected workload — production VM, NAS snapshot, GCS offsite. 2 different media — block storage (NAS) and object storage (GCS). 1 copy offsite — Google Cloud Storage bucket in a separate region from the WDC office.

1. Protected Assets¶

Asset	Host	Tier	RPO	RTO
`ocean.wdc.us.gl3` (KACE SMA)	water	Tier 1 — Critical	4 h	4 h
Future: `sky.wdc.us.gl3`	fire	Tier 1	4 h	4 h
Future: `rain.wdc.us.gl3`	fire	Tier 2	12 h	8 h
Future: `wind.wdc.us.gl3`	fire	Tier 2	12 h	8 h
Future: `sun.wdc.us.gl3`	fire	Tier 3	24 h	24 h
ESXi host config (`water`, future `fire`/`flower`)	n/a	Tier 1	24 h	2 h (reapply config)
NAS volumes hosting VMDKs	NAS	Tier 1	4 h	4 h

Tiering rules

Tier 1: revenue, security tooling, or staff productivity blocking.
Tier 2: important but tolerates a workday of data loss.
Tier 3: dev / utility / easy to rebuild from IaC.

2. Snapshot Schedule (VM-level, vSphere)¶

Snapshots are short-lived recovery points — they are not backups. Maximum age is enforced.

Workload	Trigger	Frequency	Retention	Quiesced?	Memory?
Tier 1 VMs	Scheduled	Every 4 h	24 h (6 snaps max)	Yes	No
Tier 1 VMs	Pre-change	Manual, before any patch / config change	24 h	Yes	Yes
Tier 2 VMs	Scheduled	Every 12 h	48 h	Yes	No
Tier 3 VMs	On demand only	—	72 h max	Yes	No
Any VM	Snapshot age guard	—	Auto-delete at 72 h	—	—

Snapshot hygiene

vSphere snapshots degrade performance the longer they live and the larger their delta. A monitor job alerts SOC when any snapshot exceeds 72 h or 50 GB delta.

3. Backup Schedule (Image-level, Veeam)¶

Backups are taken with Veeam Backup & Replication writing to the on-prem NAS repository, then copied offsite to GCS.

Job	Source	Frequency	Local Retention (NAS)	Offsite Retention (GCS)	Encrypted
WDC-Tier1-Daily	All Tier 1 VMs	Daily 22:00 ET	14 daily	30 daily	AES-256
WDC-Tier1-Weekly	All Tier 1 VMs	Sun 23:00 ET	4 weekly	12 weekly	AES-256
WDC-Tier1-Monthly	All Tier 1 VMs	1st of month	3 monthly	12 monthly	AES-256
WDC-Tier1-Annual	All Tier 1 VMs	Jan 1	n/a	7 annual	AES-256
WDC-Tier2-Daily	All Tier 2 VMs	Daily 23:00 ET	7 daily	14 daily	AES-256
WDC-Tier2-Weekly	All Tier 2 VMs	Sun 23:30 ET	4 weekly	8 weekly	AES-256
WDC-Tier3-Weekly	All Tier 3 VMs	Sun 00:30 ET	4 weekly	4 weekly	AES-256
WDC-ESXi-Config	Host profiles + IaC export	Daily 21:00 ET	14 daily	30 daily	AES-256

3.1 Storage Targets¶

Target	Type	Path	Capacity	Notes
Primary (on-prem)	NAS NFS share	`nas-wdc-01:/backups/wdc`	10 TB usable	Immutable / object-lock enabled
Offsite	GCS bucket	`gs://gpus-wdc-backups` (Coldline → Archive lifecycle)	unlimited	Bucket lock + retention policy (30 d minimum)

3.2 Encryption & Key Management¶

All backup jobs use AES-256 with a Veeam-managed encryption key.
Master key escrowed in the IT password vault; two-person rule for retrieval.
GCS bucket uses CMEK (customer-managed encryption keys) from Google KMS, key rotation 90 d.

4. Restore Testing¶

Untested backups are not backups. Restore tests are mandatory and tracked.

Test	Frequency	Method	Owner	Pass criteria
File-level restore (random VM)	Weekly	Veeam Instant Restore to sandbox	IT Ops	File matches, hash verified
Full-VM restore (Tier 1)	Monthly	Restore to isolated `dr-sandbox` network	IT Ops + SOC	VM boots, services up, no malware indicators
Bare-metal ESXi rebuild + config restore	Quarterly	Rebuild lab host from runbook	Cyber Sec	Host profile compliant, telemetry flowing
Full DR drill (cross-region)	Annual	Restore Tier 1 from GCS into GCP cold-standby project	All	RTO/RPO met for every Tier 1 asset

Each test is logged in the DR test register.

5. Monitoring¶

Veeam → Wazuh integration via syslog: every job emits start, success, failure, and warning events.
Failure of any Tier 1 job → immediate page to on-call.
Two consecutive failures of any tier → high-severity ticket auto-created in KACE.
Daily summary email to it-ops@greenpeace.org.
Dashboard tile on soc.greenpeace.us shows last-good-backup age for every protected VM.

6. Compliance Mapping¶

Control / Framework	Section	How addressed
CIS Controls v8	Control 11 — Data Recovery	Tiered schedule, offsite copy, tested restores
NIST CSF 2.0	PR.DS-11 (data backup), RC.RP (recovery planning)	Documented schedule + DR plan
PCI-DSS v4.0	Req. 9.4.1, 12.10.1	Backup media security, IR/DR alignment
Greenpeace IRP	§8 (Recovery)	Backups feed the IRP recovery phase

7. Change Log¶

Date	Change	By
2026-05-12	Initial schedule authored for WDC cluster	R. Chhetry