Incident Response — WDC On-Prem
Classification: CONFIDENTIAL — Internal Use Only
Document: response-plans/wdc-on-prem-irp.md · v1.0 · 2026-05-12 · GPUS-IT
Scope
This section extends the Greenpeace USA Incident Response Plan with procedures specific to the WDC on-prem cluster (water, future fire, flower) and tenant VMs (ocean.wdc.us.gl3 today). It does not replace the org-wide IRP — read both.
1. In-Scope Incidents
| Category |
Examples |
| Hypervisor compromise |
Suspicious ESXi shell login, unsigned binary execution, root-level changes outside change windows |
| Tenant VM compromise |
ocean KACE SMA exploitation, unauthorized admin user creation |
| Storage tampering |
Unexpected NAS snapshot deletion, NFS export modification |
| Power / environmental |
UPS failure with no graceful shutdown, HVAC failure causing thermal shutdown |
| Network compromise |
Meraki admin compromise, core switch config change outside change window |
| Backup tampering |
Veeam job disabled, retention shortened, GCS bucket policy weakened |
2. Severity & Initial Response
| Severity |
Definition |
Initial response time |
Notification |
| SEV-1 |
Active compromise of hypervisor or KACE, or confirmed data exfiltration |
15 min |
Director + Exec + Legal |
| SEV-2 |
Suspected compromise; integrity controls firing |
30 min |
Director + IT Ops Lead |
| SEV-3 |
Single-host anomaly, no confirmed compromise |
1 h |
On-call engineer |
| SEV-4 |
Informational / hardening finding |
Next business day |
Ticket only |
3. Detection Sources
| Source |
What it detects |
Routed to |
Wazuh agents (ocean and future VMs) |
File integrity, syscall anomalies, SCA drift |
SOC dashboard + Slack #soc-alerts |
| Wazuh vSphere integration |
vCenter audit events |
SOC dashboard |
Splunk index gpus_wdc |
Aggregated syslog (ESXi, NAS, Meraki) |
SOC dashboard |
| Veeam syslog |
Backup tampering |
SOC dashboard + on-call |
| APC NMC SNMP |
Power events |
#wdc-ops + on-call |
| KACE SMA itself |
Patch state, missing agents |
KACE → ticket |
4. IR Process (NIST SP 800-61r2 aligned)
4.1 Preparation
4.2 Detection & Analysis
- SOC analyst triages the alert.
- Confirm scope:
- Which host? Which VM?
- User account involved?
- Is the event reproducible / ongoing?
- Determine severity (table above).
- Open IR ticket; tag
wdc, sev-N, asset names.
4.3 Containment
| Scenario |
Action |
| Hypervisor compromise |
Enable Strict Lockdown if not already; disable ESXi shell + SSH; isolate mgmt port to admin VLAN only |
| Tenant VM compromise |
Move VM to WDC-Quarantine port group (no egress); snapshot the VM with memory for forensics; disable the VM's user accounts in upstream IdP |
| Network compromise |
Roll Meraki admin credentials; revoke API keys; force re-auth; review last 30 days of admin events |
| Backup tampering |
Lock down Veeam console; verify GCS bucket lock + retention policy intact; do not delete suspicious restore points |
Do not power off
For SEV-1/SEV-2, do not power off a suspect VM until a memory-inclusive snapshot is captured. Powering off destroys volatile evidence.
4.4 Eradication
- Identify root cause (patch level, credential, misconfiguration).
- Apply patches via KACE.
- Rotate any credentials that may have been exposed.
- Update hardening scripts in the IaC repo if a config gap is found.
4.5 Recovery
- Restore from a pre-incident backup (see restore testing).
- Bring VM back on the
VM-Prod port group only after Wazuh confirms a clean baseline.
- Monitor closely for 7 days (heightened alert thresholds in SOC).
4.6 Lessons Learned
- Post-incident review within 5 business days of SEV-1/SEV-2 closure.
- Output: action items, owners, due dates — tracked in KACE until closed.
- Update this runbook with any new detection rules or containment shortcuts.
5. Forensic Artifacts to Preserve
| Artifact |
Where |
How long |
| Memory-inclusive VM snapshot |
datastore-wdc-nas-01/forensics/ |
1 year |
ESXi /var/log/* |
Wazuh + cold copy to GCS |
1 year |
| vCenter event log |
Wazuh + cold copy to GCS |
1 year |
| Veeam job logs around the incident window |
Veeam DB export |
1 year |
| Network captures (if available) |
nas-wdc-01:/forensics/pcap/ |
1 year |
| Role |
Primary |
Backup |
Reach |
| Incident Commander |
Director, Cyber Sec |
IT Ops Lead |
PagerDuty wdc-ic |
| SOC Analyst on-call |
Rotation |
Rotation |
Slack #soc-alerts |
| Hypervisor lead |
Senior SysAdmin |
IT Ops Engineer |
KACE on-call |
| Network lead |
Network Engineer |
Director |
KACE on-call |
| Legal / Privacy |
General Counsel |
Deputy |
Email |
7. Compliance Mapping
| Framework |
Requirement |
Addressed |
| NIST CSF 2.0 |
DE.CM, RS.RP, RS.AN, RS.MI, RS.CO |
§3, §4 |
| NIST SP 800-61r2 |
Preparation → Lessons Learned |
§4 |
| CIS Controls v8 |
8, 13, 17 |
Logging, monitoring, IR |
| PCI-DSS v4.0 |
Req. 10, 11, 12.10 |
Logging, detection, IR plan |
8. Change Log
| Date |
Change |
By |
| 2026-05-12 |
Initial WDC on-prem IRP section |
R. Chhetry |