Skip to content

Component Coverage Standard

Classification: CONFIDENTIAL — Internal Use Only Document: governance/component-coverage-standard.md · v1.0 · 2026-05-12 · GPUS-IT


1. Purpose & scope

This standard requires that every host, hypervisor, network appliance, storage device, power device, and managed cloud service documented anywhere in the GPUS-IT infrastructure docs portal must also exist in the central inventory (inventory.yaml at the repo root) and appear in each of the three operations portals at a level appropriate to its monitoring posture.

Coverage is verified by a pre-build script. Drift fails the Cloud Build pipeline. Exceptions require a dated entry in .coverage-exceptions.yaml.

The standard's purpose is to prevent infrastructure that exists on paper but is invisible to operations — and infrastructure that exists in operations but is undocumented.

2. The three-portal coverage requirement

A documented infrastructure entity is covered when all three are true:

  1. mkdocs-portal — referenced from at least one page under docs/architecture/, docs/infrastructure/, docs/hostregistry/, or docs/response-plans/. The documented_in field on the inventory entry lists which pages. The coverage script verifies the listed pages exist and contain the entity ID (case-insensitive).
  2. status-site — appears as a card in the appropriate category section, showing a monitoring_status dot.
  3. soc-site — appears in the appropriate tile under the wdc (or future location-scoped) tab, with at least one telemetry source named in its monitoring_intent field.

A PR that adds an infrastructure document without the matching inventory entry, status-site card, or soc-site tile is incomplete and must not merge.

3. The three monitoring states

State Meaning Required fields UI
live Telemetry actively wired and reporting today. monitoring_intent (the active source) Green dot
planned Documented and committed; telemetry not yet wired. monitoring_intent (the eventual source) Striped-yellow dot
unmonitored Exists, but no telemetry intent. justification (why) Grey dot
decommissioned No longer in service; display-only. decommissioned_reason Strike-through, non-interactive

planned is the default for newly-documented infrastructure. Promotion to live happens when the named telemetry source produces data the relevant backend can ingest.

4. The inventory file

inventory.yaml at the repository root is the single source of truth. See Inventory Schema for the full schema, allowed field values, and required-field matrix.

The inventory feeds:

  • mkdocs-portal indirectly via documentation references
  • status-site + soc-site via inventory.json baked at Docker build time
  • status-backend + soc-backend via the generated servers.py (linux_hosts section only — non-SSH entities don't appear)

5. The coverage check

The script at scripts/check-component-coverage.py runs as a pre-build step in every service's Cloud Build pipeline. It verifies:

  1. Inventory schema integrity (required fields, enum values, conditional fields like justification and decommissioned_reason).
  2. Referential integrity — every reference in powers:, powered_by:, fed_from:, hosted_on:, vms: resolves to a known entity or to an external_references entry.
  3. documented_in paths exist and mention the entity ID.
  4. Identifiers cited in docs/hostregistry/*.csv appear in the inventory.
  5. The committed status-backend/servers.py and soc-backend/servers.py match what would be generated from inventory.linux_hosts.
  6. Site inventory.json artifacts (if present) match inventory.yaml.

Drift unresolved by an exception fails the build, blocking deploy.

Triggers wired to inventory changes

Inventory-only edits (inventory.yaml or .coverage-exceptions.yaml at the repo root) fire builds on five Cloud Build triggers:

  • gpus-deploy-mkdocs-portal
  • gpus-deploy-status-site
  • gpus-deploy-soc-site
  • gpus-deploy-status-backend
  • gpus-deploy-soc-backend

Each of those triggers includes inventory.yaml and .coverage-exceptions.yaml in its includedFiles list, so an inventory-only commit rebuilds every consumer that bakes inventory.json or runs the coverage check.

gpus-deploy-security-backend is out of scope for the coverage standard at present — it has neither the coverage-check pre-build step nor the inventory-bake step. It is intentionally excluded from the includedFiles patch.

To extend coverage to a new service, both the service's cloudbuild.yaml pre-build step and the Cloud Build trigger's includedFiles need updating; missing either half leaves the standard documented but not enforced.

6. Exceptions — .coverage-exceptions.yaml

Coverage exceptions go in .coverage-exceptions.yaml at the repo root. Every exception requires an expires date in ISO format. Indefinite exceptions are not permitted — the schema forces conscious renewal.

exceptions:
  - finding_id: csv:meraki-hostregistry.csv:wdc-wap-1
    expires: 2026-06-15
    rationale: Sample exception entry.
    owner: rajesh.chhetry@greenpeace.us

Expired exceptions are ignored — the underlying finding becomes a build failure again.

7. Workflow — adding new infrastructure

  1. Add an entry to inventory.yaml under the appropriate category, with monitoring_status: planned and a monitoring_intent value.
  2. Add a documentation page (or extend an existing one) under docs/architecture/ and list its path in the inventory entry's documented_in: field.
  3. If the entity is a Linux host, run scripts/regenerate-servers-py.py to regenerate both backends' servers.py.
  4. Add a card entry to the status-site front-end (read from inventory.json at runtime).
  5. Add a tile entry to the soc-site front-end (read from inventory.json at runtime).
  6. Run python3 scripts/check-component-coverage.py locally before pushing.
  7. Push the branch. The Cloud Build pre-build step verifies coverage.

8. Workflow — promoting from planned to live

  1. Wire the telemetry source named in monitoring_intent.
  2. Verify the relevant backend ingests the data.
  3. Change the inventory entry's monitoring_status from planned to live.
  4. Commit + push. The portals' UI flips the status dot from striped yellow to green on next build.

9. Compliance alignment

Framework Reference
CIS Controls v8 Control 1 — Inventory & Control of Enterprise Assets
CIS Controls v8 Control 12 — Network Infrastructure Management
NIST CSF 2.0 ID.AM-01 — Hardware assets managed
NIST CSF 2.0 ID.AM-02 — Software platforms managed
NIST SP 800-53 CM-8 — Information System Component Inventory
NIST SP 800-171 3.4.1 — Establish and maintain baseline configurations
PCI-DSS v4.0 9.5 — Physical security of media and systems