Skip to content

SOC Dashboard

Overview

The SOC (Security Operations Center) Dashboard provides a unified view of security and infrastructure posture across all GPUS infrastructure. Following the merge of the security site into the SOC dashboard, it now serves as the single pane of glass for all operational and security monitoring at soc.greenpeace.us.

The dashboard aggregates data from six security tools and six infrastructure feeds into a 15-tab interface. Security monitoring content that previously lived on the status site has been consolidated here, a dedicated Threat Hunting tab supports active investigation of Wazuh alerts, and a Reporting tab documents the standalone automated reporting system running on MAPLE that generates and distributes recurring IT Ops, Executive, Staff, and Board reports.

Architecture

Component Cloud Run Service Description
Frontend gpus-soc-site Static HTML/JS dashboard served by nginx
Backend gpus-soc-backend Flask API that collects and aggregates security data

The backend runs on Cloud Run with the VPC connector (gpus-vpc-connector) attached, allowing it to reach on-premises and cloud VMs over the site-to-site VPN for SSH-based data collection.

Detection Pipeline

The end-to-end detection pipeline flows through multiple components:

Wazuh agents (all 7 VMs) → MAPLE:1514 (Wazuh Manager)
  → Filebeat (MAPLE) → CEDAR:9200 (Elasticsearch, wazuh-alerts-* index)
  → SOC Dashboard (CEDAR ES query for 24h alerts)

Filebeat is installed on MAPLE to forward Wazuh alerts from the Manager to CEDAR's Elasticsearch. The wazuh-alerts-* index pattern is used for alert storage and querying.

Dashboard Tabs (15-Tab Structure)

The SOC dashboard presents data across 15 tabs. The tab list grew from the original 12-tab layout to 13 when Security Posture was consolidated from the status site, to 14 when the dedicated Threat Hunting view was introduced, and to 15 with the addition of the automated Reporting tab.

# Tab data-v Data Source Description
1 Overview overview All feeds Aggregated SOC score, alert summary, top Wazuh rules, critical findings
2 Wazuh Alerts wazuh CEDAR (172.16.0.13) 24-hour alert feed, severity / top-rules charts, filter pill bar, drill-down target from Overview
3 Vulnerability Scan vuln OAK (172.16.0.10) OpenVAS/Greenbone results — 7-finding sortable table with CVSS, severity, host, port
4 Hardening hardening All servers via SSH Lynis system hardening audit scores per server
5 IDS / Fail2ban ids All servers via SSH Intrusion prevention ban counts, jail status, top banned IPs
6 File Integrity aide All servers via SSH AIDE file integrity monitoring status, changed/added/removed files
7 System Metrics metrics MAPLE (172.16.0.12) Prometheus metrics — CPU, memory, disk, network per server
8 Compliance compliance Static + computed CIS, PCI-DSS, NIST compliance matrix and status
9 Defense in Depth defense Static 7-layer defense model visualization with control mappings
10 Threat Vectors threats Static Nation-state, APT, hacktivist, insider, supply chain threat analysis
11 IR Runbooks runbooks Static Incident response procedures (ransomware, data breach, etc.)
12 Red / Blue Team redblue Local data Exercise calendar, BT-001/BT-002 drill results, KPIs
13 Threat Hunting threat-hunt CEDAR (172.16.0.13) Active hunt view — timeline, MITRE heatmap, auto-grouped attack chains, alert detail panel with TP/FP tagging
14 Security Posture posture All feeds Per-server security posture scoring (Fail2ban, AIDE, auditd, SELinux, firewall)
15 Reporting reporting Standalone reporter on MAPLE Automated PDF report generation — IT Ops weekly, Executive monthly, Staff newsletter monthly, Board quarterly

Consolidation from Status + Security Sites

Tabs 8–11 (Compliance, Defense in Depth, Threat Vectors, IR Runbooks) were previously on the security site. Tab 14 (Security Posture) consolidates per-server security data that was previously on the status site. The status site now focuses on infrastructure operations (Operations, Executive, Backups, Compliance, Cloud Costs, Governance, Audit Trail).

Threat Hunting Tab

The Threat Hunting tab (data-v="threat-hunt") was added to give analysts a dedicated workspace for investigating recent Wazuh alerts rather than simply viewing summary counts. It is backed by a new backend endpoint, GET /api/threat-hunt, which runs four Elasticsearch queries against the wazuh-alerts-* index on CEDAR and returns a single aggregated payload. The tab is loaded lazily the first time the user clicks on it, so its Elasticsearch cost is only incurred on demand.

A KPI strip at the top of the tab summarizes the last 24 hours: Critical (rule level ≥ 12), High (rule level 10–11), Medium (rule level 7–9), Attack Chains (count of auto-grouped sequences), and Tactics Seen (distinct MITRE tactics observed).

Below the KPIs, four sections make up the hunting workflow:

1. Alert Timeline — Last 24h. A Chart.js canvas (ch-th-timeline) rendering an hourly histogram of alerts bucketed by severity. The backend uses a date_histogram aggregation with a 1-hour fixed interval and three filtered sub-aggregations (critical / high / medium) so the chart can render stacked or overlaid severity lines without any client-side bucketing. Clicking a timeline point selects that hour and surfaces the top alerts for it in the alert detail panel.

2. MITRE ATT&CK Heatmap — Alerts by Server × Tactic. A table-based heatmap (th-heatmap) with agents on one axis and the 12 standard Enterprise tactics on the other (Initial Access through Impact). The backend runs a two-level terms aggregation (by_agentby_tactic) filtered to alerts with rule.mitre.tactic present, and returns a matrix[agent][tactic] = count object plus a max value used to scale cell intensity. Cells are clickable — selecting one filters the alert detail panel to that agent-and-tactic slice. An empty-state card is shown when there are no MITRE-tagged alerts in the window.

3. Attack Chains — Auto-grouped by agent + 5-min window. The backend pulls up to 500 recent alerts with rule.level ≥ 7, sorts them chronologically, and walks them with a rolling grouper that starts a new chain whenever the agent changes or more than 300 seconds have passed since the last alert in the current group. Only groups with ≥ 2 alerts are kept, and chains are sorted by max rule level then alert count (top 25 returned). Each chain shows the agent, the alert count, the distinct MITRE technique IDs observed across the chain, the severity of the hottest alert, and the start/end timestamps. Each chain can be tagged TP (true positive) or FP (false positive) directly from its card — tags are held client-side in TH_CHAIN_TAGS for the session so analysts can triage a batch of chains visually without losing their place.

4. Alert Detail Panel. A collapsible side panel (th-detail) that inspects an individual alert. When the analyst clicks a timeline point, a heatmap cell, or an attack chain card, the panel populates with the alert's timestamp, severity, rule ID and description, agent name and IP, the full MITRE tactic / technique / ID arrays, source IP, and the raw full_log line. The panel exposes the same TP / FP buttons so individual alerts inside a chain can be dispositioned independently of the chain-level tag.

The backend implementation lives in soc-backend/app.py under get_threat_hunt_data() and the /api/threat-hunt route. The frontend logic is in soc-site/index.html in the renderThreatHunt / renderThTimeline functions; lazy-load is triggered in the tab-switch handler that checks dataset.v === 'threat-hunt' and calls fetchThreatHunt() the first time the tab is opened.

Dashboard Enhancements — 2026-04-10 / 2026-04-13

A batch of usability and data-quality improvements shipped between 2026-04-10 and 2026-04-13. Each one is reflected in the live dashboard at soc.greenpeace.us and in the commits listed in Section Deployment. All enhancements were validated end-to-end during BT-001 + BT-003 replay on 2026-04-11 and confirmed in the documentation review on 2026-04-13.

Alert Drill-Down

The Overview tab now supports click-to-filter navigation into the Wazuh Alerts tab. The three severity KPIs (Critical, High, Medium) and the severity doughnut chart and per-agent bar chart on Overview are clickable. A click picks up a filter predicate and pushes it into a global window.SOC_FILTER object ({level, agent, rule}), switches the active tab to Wazuh Alerts, and re-fetches /api/soc with the filter serialized as query-string parameters so the server returns a pre-filtered alert feed.

The filter bar on the Wazuh Alerts tab renders a row of filter pills showing which filter predicates are currently active — for example, "High alerts only" or "Agent: WIND" or "Rule: Log file truncation". Each pill has a ✕ button that calls clearFilterKey(key) to drop just that predicate and re-fetch. A red "Clear Filters" button clears all predicates at once via clearAllFilters().

Severity levels map to the ES rule.level ranges as follows:

Label level_min level_max
Critical 12 99
High 10 11
Medium 7 9
Low / Info 0 6

When a filter is active, the backend automatically enlarges the recent-alerts slice from 10 to 50 entries so analysts can actually work through the filtered result set without paging. The filter bar is hidden when no predicate is set.

Vulnerability Scan Findings Table

The Vulnerability Scan tab now renders a sorted OpenVAS findings table below the summary KPIs. The table has six columns — #, Vulnerability, Host, Port, CVSS, Severity — and is sorted client-side by CVSS descending so the most severe finding is always on top. CVSS values are shown in monospace next to an inline cvss-bar that scales from 0 to 10, colored by severity. The severity label prefers the explicit severity field from the OpenVAS JSON and falls back to a CVSS-derived bucket (high ≥ 7, medium ≥ 4, else low). The current feed carries 7 findings aggregated across WDC and GCP hosts.

On the backend, get_openvas_findings() now runs every payload through _normalize_openvas() before returning it, which guarantees the response always exposes a findings: [] list — even when the GCS object is empty, stale, or the NAS fallback returns nothing. This lets the frontend render an explicit empty-state row (No findings data available) rather than crashing on undefined.map.

Wazuh Alerts Layout Fix

The Wazuh Alerts tab layout was re-balanced so that the right-hand column is a clean 3-card stack: Alert Distribution by Severity (doughnut, 160 px), Top 5 Triggered Rules (bar, 160 px), and Top Triggered Rules (scrollable table, 180 px). The previously duplicated Top-Rules table that sat below the charts has been removed entirely, and the Alert Feed on the left was tightened from 400 px to 350 px max-height so the two columns line up. The #wz-rules table is now embedded inside a chart-card so all three right-column cards share the same visual framing.

The filter bar was also moved out of the left column and promoted to sit immediately under the KPI strip, so drill-down filter pills apply visually to the whole tab (feed + charts) rather than looking like they only filter the feed.

MITRE Heatmap Tactic-Name Alignment

The MITRE ATT&CK heatmap on the Threat Hunting tab was silently missing the Command and Control column for live alerts because the hard-coded tactic string used the ampersand form (Command & Control) while Elasticsearch stores the field as the ATT&CK-canonical Command and Control. The canonical list is now defined once as CANONICAL_TACTICS inside get_threat_hunt_data() and every downstream lookup — both the column list returned to the frontend and the matrix[agent][tactic] keys — uses the raw ES bucket key verbatim.

In addition, the response now unions the canonical 12 tactics with any extra tactic strings actually observed in ES (sorted and appended at the end). This keeps the standard ATT&CK column order stable for visual scanning while still surfacing any non-canonical tactic strings the Wazuh rules may emit without silently dropping their counts.

MITRE Heatmap Agent FQDN Normalization

A second heatmap fix shipped on 2026-04-11 addressed a row-level rather than a column-level blind spot. The Wazuh manager on MAPLE registers each agent under its DNS short name (wind, oak, cedar, …), but the Wazuh agent service on some hosts reports its own hostname back into the alert stream as a fully qualified name (wind.wdc.us.gl3, oak.cloud.us, etc.). The raw terms aggregation on agent.name therefore produced two sibling rows for the same physical host — one under the short name and one under the FQDN — which split detection counts across rows and made the heatmap's "which server lit up" read ambiguous during BT-001 / BT-003 replay.

The backend now defines an AGENT_MAP constant inside get_threat_hunt_data() that folds every known FQDN variant to its canonical short name:

Canonical FQDN variants folded in
sky sky.wdc.us.gl3
rain rain.wdc.us.gl3
sun sun.wdc.us.gl3
wind wind.wdc.us.gl3
oak oak.cloud.us
maple maple.cloud.us
cedar cedar.cloud.us

The normalization is applied in two places: (a) the by_agent bucket key is rewritten to AGENT_MAP.get(key, key) before being appended to the matrix, and (b) the agents list returned alongside the matrix is deduplicated after normalization so each canonical host appears exactly once in the heatmap. Any unknown agent string is left unchanged, which keeps the heatmap stable if a new VM is brought online before AGENT_MAP is updated.

The net effect during BT-003 was that WIND's attack chain produced a single heatmap row (wind) with Execution: 5, Credential Access: 48, Command and Control: 8, instead of being spread across both wind and wind.wdc.us.gl3 rows with half the counts each.

Reporting Tab

The Reporting tab (data-v="reporting") documents the standalone automated reporting system that generates and distributes four recurring PDF reports across IT operations, executive leadership, general staff, and the Board of Directors. Reports aggregate live SOC data — Wazuh alerts, OpenVAS findings, Lynis scores, Fail2ban bans, AIDE changes, Prometheus metrics, and compliance posture — into audience-appropriate formats with distribution handled via email through a local Postfix server.

Architecture

As of the April 2026 migration, report generation no longer runs in the SOC backend on Cloud Run. Instead, a standalone Python reporting system is deployed on MAPLE (172.16.0.12) at /opt/gpus-reports/. The system is driven by three scripts:

Script Purpose
report_generator.py CLI that calls https://gpus-soc-backend-3tmz2tp2iq-uc.a.run.app/api/soc and /api/threat-hunt for live data, renders a branded PDF with reportlab, saves it to /opt/gpus-reports/output/, and uploads it to gs://gpus-infra-backups-wdc/reports/YYYY-MM-DD/
report_mailer.py Sends the generated PDF via the local Postfix relay on localhost:25 from gpus-it-security@greenpeace.org to the configured recipient list
report_cron.sh Thin wrapper invoked from cron that runs the generator followed by the mailer and appends all output to /var/log/gpus-reports.log

This decoupling means the SOC Flask backend no longer carries report-generation code, and reports continue to be produced even when Cloud Run is idle or the backend is redeploying — MAPLE only needs the live SOC API to be reachable at dispatch time.

Report Types

Report Audience Cadence Content Focus
IT Ops Weekly IT / Security team Every Monday 08:00 UTC 6 KPIs, Top Wazuh Alerts, OpenVAS Findings, Lynis Scores, Action Items. Classification: INTERNAL — IT Team Only
Executive Monthly CTO / CISO / VP IT 1st of month 08:00 UTC 5 KPIs, Security Posture, 5 Compliance Frameworks (CIS / PCI-DSS / NIST CSF / MITRE ATT&CK / NIST 800-53), Carbon Footprint, Risk Items, Decisions Required. Classification: CONFIDENTIAL — Senior Management
Staff Newsletter Monthly All Greenpeace USA staff 15th of month 08:00 UTC Friendly tone, How We Protect Your Data, This Month by Numbers, Threats We're Watching, 5 Security Tips. No classification.
Board Quarterly Board of Directors 1st of Jan / Apr / Jul / Oct 09:00 UTC Strategic Summary, Compliance Trends (previous / current / target), Carbon Footprint, Key Achievements, Risk Register, Infrastructure Budget Table (approx. $362/mo), Board Recommendations. Classification: BOARD CONFIDENTIAL
Carbon Footprint Section

As of April 2026 (commit 85fa701), the Executive Monthly and Board Quarterly reports include a Carbon Footprint section positioned immediately after the compliance tables. It reports greenhouse-gas emissions for GPUS cloud infrastructure sourced from the GCP Carbon Footprint service and served via gs://gpus-infra-backups-wdc/carbon/latest.json. The section renders a 5-card KPI strip (yearly total, latest month with MoM change, Scope 2 market, Scope 3, top region), a scope breakdown table with plain-language explanations for Scope 1 / 2 / 3, a regional emissions table (per-GCP-region kgCO₂e with percentage share), and contextual comparisons (equivalent miles driven, smartphone charges, trees required to absorb annually, cost to offset). Greenpeace green #00be51 is used as the accent. The data is updated monthly by uploading a refreshed JSON to the GCS path documented on the Status Dashboard page; no redeploy of the report generator is required.

v1.2 (2026-04-17) layout fix — Scope breakdown table column width. The Scope 1/2/3 breakdown table in the Carbon Footprint section was re-tuned to widen the left-hand description column to 5.4 inches so the plain-language explanations for each scope no longer wrap awkwardly or push the kgCO₂e values off the right edge of the letter-sized PDF. The numeric columns (kgCO₂e, % of total) retain their prior widths; only the description column was adjusted. This is a presentation-only change in report_generator.py — no schema, data source, or email filename convention changed. Existing archived PDFs in gs://gpus-infra-backups-wdc/reports/ are unaffected; the new column widths apply to all Executive Monthly and Board Quarterly reports generated on or after 2026-04-17.

Email Attachment Filename Convention

report_mailer.py attaches PDFs to outbound email using a clean audience/cadence filename convention (no date) so recipients see a predictable filename in their inbox — gpus-report-it-ops-weekly.pdf, gpus-report-executive-monthly.pdf, gpus-report-staff-monthly.pdf, and gpus-report-board-quarterly.pdf. Dated filenames (gpus-<type>-report-YYYY-MM-DD.pdf) remain in use for the local /opt/gpus-reports/output/ archive and the dated GCS path gs://gpus-infra-backups-wdc/reports/YYYY-MM-DD/ so historical reports stay individually addressable.

All four PDFs use the Greenpeace visual identity — the Greenpeace USA logo rendered large at the top of the first page, the green accent color #00be51 for headings, KPI tiles, and table header rows, and a consistent footer on every page. The classification banner is rendered at the top of each classified report so recipients see the handling sensitivity immediately.

Automated Distribution Schedule

Reports are scheduled via cron on MAPLE, and each cron entry invokes /opt/gpus-reports/report_cron.sh with the report type as its only argument. Distribution is via email only — there is no Slack or HappyFox integration in the new system.

Report Cron Expression Human Schedule
IT Ops Weekly 0 8 * * 1 Every Monday 08:00 UTC
Executive Monthly 0 8 1 * * 1st of every month 08:00 UTC
Staff Newsletter Monthly 0 8 15 * * 15th of every month 08:00 UTC
Board Quarterly 0 9 1 1,4,7,10 * 1st of Jan / Apr / Jul / Oct 09:00 UTC

During the current rollout, every report type is delivered to the single test recipient rajesh.chhetry@greenpeace.us. The per-audience distribution lists (IT / Security, Executive leadership, All Staff, Board) will be populated once the system exits test mode.

The full root crontab on MAPLE, including the daily cloud backup job, is:

# GPUS Cloud Backup -- Daily 2:00 AM UTC
0 2 * * * /usr/local/bin/gpus-cloud-backup.sh >> /var/log/gpus-backup.log 2>&1
# IT Ops Weekly -- Every Monday 8:00 AM UTC
0 8 * * 1  /opt/gpus-reports/report_cron.sh weekly
# Executive Monthly -- 1st of month 8:00 AM UTC
0 8 1 * *  /opt/gpus-reports/report_cron.sh monthly
# Staff Newsletter Monthly -- 15th of month 8:00 AM UTC
0 8 15 * * /opt/gpus-reports/report_cron.sh newsletter
# Board Quarterly -- 1st of Jan/Apr/Jul/Oct 9:00 AM UTC
0 9 1 1,4,7,10 * /opt/gpus-reports/report_cron.sh quarterly

CLI Usage

The three scripts can be invoked directly on MAPLE for ad-hoc generation, re-sending a previously generated PDF, or manual testing of the cron path:

python3 /opt/gpus-reports/report_generator.py --type weekly|monthly|newsletter|quarterly
python3 /opt/gpus-reports/report_mailer.py --pdf <path> --type weekly|monthly|newsletter|quarterly
/opt/gpus-reports/report_cron.sh weekly|monthly|newsletter|quarterly

The generator writes a dated PDF under /opt/gpus-reports/output/ and uploads two copies to GCS: a dated path at gs://gpus-infra-backups-wdc/reports/YYYY-MM-DD/<filename>.pdf for historical retention, and a stable gs://gpus-infra-backups-wdc/reports/<type>/latest.pdf pointer so consumers can always fetch the most recent report of a given type without knowing the generation date.

Dependencies on MAPLE

The reporting host runs Python 3.6 with the following packages installed system-wide: reportlab 3.6.8 for PDF rendering, Pillow 8.4.0 for logo handling, requests for calling the SOC API, and google-cloud-storage for GCS upload. Postfix is installed and configured as a local submission relay listening on localhost:25, and gcc plus zlib-devel are installed to satisfy compile-time dependencies for the Pillow build.

Data Feeds

The SOC backend aggregates data from six primary sources:

Feed Source Server Collection Method Description
Wazuh Alerts CEDAR (172.16.0.13) Elasticsearch query (port 9200) 24-hour alert summary from wazuh-alerts-* index
OpenVAS Findings OAK (172.16.0.10) GCS bucket (fallback: SSH to NAS) Vulnerability scan results from Greenbone
Lynis Scores All servers SSH System hardening audit scores per server
AIDE All servers SSH File integrity monitoring status
Fail2ban All servers SSH Intrusion prevention ban counts and status
Prometheus MAPLE (172.16.0.12) HTTP query (port 9090) Security-related metrics from cloud Prometheus

All six data feeds are confirmed working across both WDC on-prem and GCP cloud VMs.

API Endpoints

Endpoint Method Description
/api/soc GET Returns full aggregated SOC data with computed SOC score. Accepts drill-down query parameters (see below).
/api/threat-hunt GET Returns threat-hunting payload: hourly timeline, MITRE heatmap matrix, auto-grouped attack chains, and recent high-severity alerts
/health GET Health check (returns ok)

Report endpoints removed

The previous /api/reports, /api/reports/generate, and /api/reports/download/<id> endpoints were removed when report generation was moved out of the Flask backend and into the standalone reporting system on MAPLE. See the Reporting Tab section above for the current CLI-driven workflow.

The /api/soc response includes: wazuh, openvas, lynis, fail2ban, aide, prometheus, soc_score, server_count, and collected_at fields.

/api/soc Drill-Down Query Parameters

As of 2026-04-10, /api/soc accepts four optional query parameters that filter the Wazuh slice of the response. The parameters are used by the alert drill-down flow on the Overview tab. All parameters are combined as a boolean must clause on top of the existing now-24h range query; the rest of the SOC payload (OpenVAS, Lynis, Fail2ban, AIDE, Prometheus) is unaffected.

Parameter Type Description
level_min int Lower bound (inclusive) on rule.level for alert filtering
level_max int Upper bound (inclusive) on rule.level for alert filtering
agent string Exact-match filter on agent.name (Wazuh agent hostname)
rule_desc string match_phrase query against rule.description

When any filter parameter is present, the wazuh.recent slice returned by the endpoint expands from 10 alerts to 50 alerts so the filtered view is actually useful for triage. The active filter is echoed back on the response as wazuh.filters = {level_min, level_max, agent, rule_desc} so the frontend can re-hydrate its filter pills after a page refresh without guessing. Invalid or non-integer level parameters are silently dropped rather than rejected.

Example: the "High alerts only" KPI click produces GET /api/soc?level_min=10&level_max=11, and the per-agent bar chart click on WIND produces GET /api/soc?agent=WIND. Filter parameters can be combined, e.g. ?level_min=10&agent=OAK returns only high-or-critical alerts on OAK.

/api/threat-hunt Response

The /api/threat-hunt response includes: timeline (buckets + critical/high/medium arrays), mitre_heatmap (tactics, agents, matrix, max), attack_chains (top 25, each with agent, techniques, alert_count, start, end, max_level, severity), recent_high (last 50 alerts with rule.level ≥ 7), and collected_at. All data is pulled from CEDAR Elasticsearch (wazuh-alerts-*) over the last 24 hours. The mitre_heatmap.tactics array is built from the CANONICAL_TACTICS constant (the 12 ATT&CK Enterprise tactics in canonical order, using the ATT&CK-standard string Command and Control) unioned with any additional tactic strings actually observed in ES — this ensures the heatmap column headers match the raw bucket keys used to look up matrix[agent][tactic] values.

Deployment

Both services auto-deploy via Cloud Build on git push:

  • Backend: soc-backend/cloudbuild.yaml — builds Python container, deploys with VPC connector and SSH key secret
  • Frontend: soc-site/cloudbuild.yaml — backs up source to GCS, builds nginx container, deploys

Source backups are stored at gs://gpus-infra-backups-wdc/portals/soc-site/.