Component Coverage Standard¶

Classification: CONFIDENTIAL — Internal Use Only Document: governance/component-coverage-standard.md · v1.0 · 2026-05-12 · GPUS-IT

1. Purpose & scope¶

This standard requires that every host, hypervisor, network appliance, storage device, power device, and managed cloud service documented anywhere in the GPUS-IT infrastructure docs portal must also exist in the central inventory (inventory.yaml at the repo root) and appear in each of the three operations portals at a level appropriate to its monitoring posture.

Coverage is verified by a pre-build script. Drift fails the Cloud Build pipeline. Exceptions require a dated entry in .coverage-exceptions.yaml.

The standard's purpose is to prevent infrastructure that exists on paper but is invisible to operations — and infrastructure that exists in operations but is undocumented.

2. The three-portal coverage requirement¶

A documented infrastructure entity is covered when all three are true:

mkdocs-portal — referenced from at least one page under docs/architecture/, docs/infrastructure/, docs/hostregistry/, or docs/response-plans/. The documented_in field on the inventory entry lists which pages. The coverage script verifies the listed pages exist and contain the entity ID (case-insensitive).
status-site — appears as a card in the appropriate category section, showing a monitoring_status dot.
soc-site — appears in the appropriate tile under the wdc (or future location-scoped) tab, with at least one telemetry source named in its monitoring_intent field.

A PR that adds an infrastructure document without the matching inventory entry, status-site card, or soc-site tile is incomplete and must not merge.

3. The three monitoring states¶

State	Meaning	Required fields	UI
live	Telemetry actively wired and reporting today.	`monitoring_intent` (the active source)	Green dot
planned	Documented and committed; telemetry not yet wired.	`monitoring_intent` (the eventual source)	Striped-yellow dot
unmonitored	Exists, but no telemetry intent.	`justification` (why)	Grey dot
decommissioned	No longer in service; display-only.	`decommissioned_reason`	Strike-through, non-interactive

planned is the default for newly-documented infrastructure. Promotion to live happens when the named telemetry source produces data the relevant backend can ingest.

4. The inventory file¶

inventory.yaml at the repository root is the single source of truth. See Inventory Schema for the full schema, allowed field values, and required-field matrix.

The inventory feeds:

mkdocs-portal indirectly via documentation references
status-site + soc-site via inventory.json baked at Docker build time
status-backend + soc-backend via the generated servers.py (linux_hosts section only — non-SSH entities don't appear)

5. The coverage check¶

The script at scripts/check-component-coverage.py runs as a pre-build step in every service's Cloud Build pipeline. It verifies:

Inventory schema integrity (required fields, enum values, conditional fields like justification and decommissioned_reason).
Referential integrity — every reference in powers:, powered_by:, fed_from:, hosted_on:, vms: resolves to a known entity or to an external_references entry.
documented_in paths exist and mention the entity ID.
Identifiers cited in docs/hostregistry/*.csv appear in the inventory.
The committed status-backend/servers.py and soc-backend/servers.py match what would be generated from inventory.linux_hosts.
Site inventory.json artifacts (if present) match inventory.yaml.
Portal presence (validate_portal_presence, Gate 5) — every non-decommissioned cloud_services entity ID appears in both the status-site and soc-site render sources. "Appears" is satisfied either by a literal reference (a hardcoded card / data-svc-id attribute naming the entity ID) or by the generic cloud-services-render sentinel that the inventory.json render path leaves behind. The check is deliberately method-agnostic: it passes on today's hardcoded cards and stays green when rendering is later refactored to read inventory.json (T5). This is what makes §2's "appears in each of the three operations portals" requirement an enforced gate rather than documented-only policy — previously only documented_in linkage (item 3) was enforced.

Drift unresolved by an exception fails the build, blocking deploy.

Triggers wired to inventory changes¶

Inventory-only edits (inventory.yaml or .coverage-exceptions.yaml at the repo root) fire builds on five Cloud Build triggers:

gpus-deploy-mkdocs-portal
gpus-deploy-status-site
gpus-deploy-soc-site
gpus-deploy-status-backend
gpus-deploy-soc-backend

Each of those triggers includes inventory.yaml and .coverage-exceptions.yaml in its includedFiles list, so an inventory-only commit rebuilds every consumer that bakes inventory.json or runs the coverage check.

gpus-deploy-security-backend is out of scope for the coverage standard at present — it has neither the coverage-check pre-build step nor the inventory-bake step. It is intentionally excluded from the includedFiles patch.

To extend coverage to a new service, both the service's cloudbuild.yaml pre-build step and the Cloud Build trigger's includedFiles need updating; missing either half leaves the standard documented but not enforced.

6. Exceptions — `.coverage-exceptions.yaml`¶

Coverage exceptions go in .coverage-exceptions.yaml at the repo root. Every exception requires an expires date in ISO format. Indefinite exceptions are not permitted — the schema forces conscious renewal.

exceptions:
  - finding_id: csv:meraki-hostregistry.csv:wdc-wap-1
    expires: 2026-06-15
    rationale: Sample exception entry.
    owner: rajesh.chhetry@greenpeace.us

Expired exceptions are ignored — the underlying finding becomes a build failure again.

7. Workflow — adding new infrastructure¶

Add an entry to inventory.yaml under the appropriate category, with monitoring_status: planned and a monitoring_intent value.
Add a documentation page (or extend an existing one) under docs/architecture/ and list its path in the inventory entry's documented_in: field.
If the entity is a Linux host, run scripts/regenerate-servers-py.py to regenerate both backends' servers.py.
Add a card entry to the status-site front-end (read from inventory.json at runtime).
Add a tile entry to the soc-site front-end (read from inventory.json at runtime).
Run python3 scripts/check-component-coverage.py locally before pushing.
Push the branch. The Cloud Build pre-build step verifies coverage.

8. Workflow — promoting from `planned` to `live`¶

Wire the telemetry source named in monitoring_intent.
Verify the relevant backend ingests the data.
Change the inventory entry's monitoring_status from planned to live.
Commit + push. The portals' UI flips the status dot from striped yellow to green on next build.

9. Compliance alignment¶

Framework	Reference
CIS Controls v8	Control 1 — Inventory & Control of Enterprise Assets
CIS Controls v8	Control 12 — Network Infrastructure Management
NIST CSF 2.0	ID.AM-01 — Hardware assets managed
NIST CSF 2.0	ID.AM-02 — Software platforms managed
NIST SP 800-53	CM-8 — Information System Component Inventory
NIST SP 800-171	3.4.1 — Establish and maintain baseline configurations
PCI-DSS v4.0	9.5 — Physical security of media and systems