Skip to content

Incident Response — WDC On-Prem

Classification: CONFIDENTIAL — Internal Use Only Document: response-plans/wdc-on-prem-irp.md · v1.0 · 2026-05-12 · GPUS-IT


Scope

This section extends the Greenpeace USA Incident Response Plan with procedures specific to the WDC on-prem cluster (water, future fire, flower) and tenant VMs (ocean.wdc.us.gl3 today). It does not replace the org-wide IRP — read both.

1. In-Scope Incidents

Category Examples
Hypervisor compromise Suspicious ESXi shell login, unsigned binary execution, root-level changes outside change windows
Tenant VM compromise ocean KACE SMA exploitation, unauthorized admin user creation
Storage tampering Unexpected NAS snapshot deletion, NFS export modification
Power / environmental UPS failure with no graceful shutdown, HVAC failure causing thermal shutdown
Network compromise Meraki admin compromise, core switch config change outside change window
Backup tampering Veeam job disabled, retention shortened, GCS bucket policy weakened

2. Severity & Initial Response

Severity Definition Initial response time Notification
SEV-1 Active compromise of hypervisor or KACE, or confirmed data exfiltration 15 min Director + Exec + Legal
SEV-2 Suspected compromise; integrity controls firing 30 min Director + IT Ops Lead
SEV-3 Single-host anomaly, no confirmed compromise 1 h On-call engineer
SEV-4 Informational / hardening finding Next business day Ticket only

3. Detection Sources

Source What it detects Routed to
Wazuh agents (ocean and future VMs) File integrity, syscall anomalies, SCA drift SOC dashboard + Slack #soc-alerts
Wazuh vSphere integration vCenter audit events SOC dashboard
Splunk index gpus_wdc Aggregated syslog (ESXi, NAS, Meraki) SOC dashboard
Veeam syslog Backup tampering SOC dashboard + on-call
APC NMC SNMP Power events #wdc-ops + on-call
KACE SMA itself Patch state, missing agents KACE → ticket

4. IR Process (NIST SP 800-61r2 aligned)

4.1 Preparation

4.2 Detection & Analysis

  1. SOC analyst triages the alert.
  2. Confirm scope:
    • Which host? Which VM?
    • User account involved?
    • Is the event reproducible / ongoing?
  3. Determine severity (table above).
  4. Open IR ticket; tag wdc, sev-N, asset names.

4.3 Containment

Scenario Action
Hypervisor compromise Enable Strict Lockdown if not already; disable ESXi shell + SSH; isolate mgmt port to admin VLAN only
Tenant VM compromise Move VM to WDC-Quarantine port group (no egress); snapshot the VM with memory for forensics; disable the VM's user accounts in upstream IdP
Network compromise Roll Meraki admin credentials; revoke API keys; force re-auth; review last 30 days of admin events
Backup tampering Lock down Veeam console; verify GCS bucket lock + retention policy intact; do not delete suspicious restore points

Do not power off

For SEV-1/SEV-2, do not power off a suspect VM until a memory-inclusive snapshot is captured. Powering off destroys volatile evidence.

4.4 Eradication

  1. Identify root cause (patch level, credential, misconfiguration).
  2. Apply patches via KACE.
  3. Rotate any credentials that may have been exposed.
  4. Update hardening scripts in the IaC repo if a config gap is found.

4.5 Recovery

  1. Restore from a pre-incident backup (see restore testing).
  2. Bring VM back on the VM-Prod port group only after Wazuh confirms a clean baseline.
  3. Monitor closely for 7 days (heightened alert thresholds in SOC).

4.6 Lessons Learned

  • Post-incident review within 5 business days of SEV-1/SEV-2 closure.
  • Output: action items, owners, due dates — tracked in KACE until closed.
  • Update this runbook with any new detection rules or containment shortcuts.

5. Forensic Artifacts to Preserve

Artifact Where How long
Memory-inclusive VM snapshot datastore-wdc-nas-01/forensics/ 1 year
ESXi /var/log/* Wazuh + cold copy to GCS 1 year
vCenter event log Wazuh + cold copy to GCS 1 year
Veeam job logs around the incident window Veeam DB export 1 year
Network captures (if available) nas-wdc-01:/forensics/pcap/ 1 year

6. Roles & Contacts (WDC-specific)

Role Primary Backup Reach
Incident Commander Director, Cyber Sec IT Ops Lead PagerDuty wdc-ic
SOC Analyst on-call Rotation Rotation Slack #soc-alerts
Hypervisor lead Senior SysAdmin IT Ops Engineer KACE on-call
Network lead Network Engineer Director KACE on-call
Legal / Privacy General Counsel Deputy Email

7. Compliance Mapping

Framework Requirement Addressed
NIST CSF 2.0 DE.CM, RS.RP, RS.AN, RS.MI, RS.CO §3, §4
NIST SP 800-61r2 Preparation → Lessons Learned §4
CIS Controls v8 8, 13, 17 Logging, monitoring, IR
PCI-DSS v4.0 Req. 10, 11, 12.10 Logging, detection, IR plan

8. Change Log

Date Change By
2026-05-12 Initial WDC on-prem IRP section R. Chhetry