Adding New Infrastructure — Quick Start¶
Quick reference for the GPUS-IT team. The full rules live in
governance/component-coverage-standard.mdoninfras.greenpeace.us.
What inventory.yaml is¶
inventory.yaml at the repo root is the single source of truth for
every host, hypervisor, network appliance, storage device, and power
device we monitor. It feeds:
- the docs portal (via
documented_in:page links), - the status and SOC portals (via
inventory.jsonbaked at Docker build), - the backends (via the generated
servers.py).
If a thing isn't in inventory.yaml, it doesn't really exist for ops.
If it's in inventory.yaml but not in the portals, the Cloud Build fails.
The three monitoring states¶
live— telemetry is wired and reporting today. Green dot. Requiresmonitoring_intentnaming the active source.planned— documented and committed, telemetry not yet wired. Striped-yellow dot. Default for new entries. Requiresmonitoring_intentnaming the eventual source.unmonitored— exists, but we have no intent to wire telemetry. Grey dot. Requires ajustification:field explaining why.
(decommissioned also exists for retired gear — strike-through, requires
decommissioned_reason:.)
The 7-step workflow (checklist)¶
- Add an entry to
inventory.yamlunder the right category, withmonitoring_status: plannedand amonitoring_intentvalue. - Add (or extend) a doc page under
docs/architecture/,docs/infrastructure/,docs/hostregistry/, ordocs/response-plans/, then list its path in the entry'sdocumented_in:field. - If it's a Linux host: run
python3 scripts/regenerate-servers-py.pyto refresh both backends'servers.py. - Add a card to the status-site front-end.
- Add a tile to the soc-site front-end.
- Run
python3 scripts/check-component-coverage.pylocally. - Push to
main. Cloud Build re-runs the coverage check pre-deploy.
Common Cloud Build coverage failures¶
The script's error messages are intentionally workflow-teaching — they tell you the exact fix.
| Error prefix | What it means | Fix |
|---|---|---|
[schema] … missing required universal field 'monitoring_intent' (or documented_in, location, monitoring_status) |
Inventory entry is missing a required field. | Add the field to the entry in inventory.yaml. |
[schema] … monitoring_status=unmonitored requires 'justification' field |
An unmonitored entry has no explanation. |
Add justification: <reason> to the entry. |
[schema] … monitoring_status=decommissioned requires 'decommissioned_reason' field |
Same idea, for retired gear. | Add decommissioned_reason: <reason>. |
[reference] …: unresolved reference 'X' |
A powers: / powered_by: / fed_from: / hosted_on: / vms: value points at something that isn't in the inventory or external_references. |
Add the target to inventory.yaml, or add an entry under external_references: if it's truly outside our scope. |
[doc] …: documented_in is empty |
Inventory entry has no doc page. | Add a path in documented_in: and ensure that page mentions the entity ID. |
[servers.py] <dir>: 'X' present in inventory.linux_hosts but not in <dir>/servers.py |
You added a host but didn't regenerate. | python3 scripts/regenerate-servers-py.py from repo root, then commit. |
[servers.py] <dir>: 'X' present in servers.py but not in inventory.yaml |
Someone edited servers.py by hand. |
Add the entity to inventory.yaml under linux_hosts, then run python3 scripts/regenerate-servers-py.py. |
Warnings (CSV hostname mismatches, missing doc paths pre-merge, inventory.json drift) do not fail the build but should be cleaned up.
Temporary exceptions¶
If you genuinely need to defer a fix, add an entry to
.coverage-exceptions.yaml at the repo root. Every exception must
have an expires: date in ISO format (YYYY-MM-DD) — indefinite
exceptions are rejected by schema. Expired entries are ignored and the
finding becomes a build failure again.
exceptions:
- finding_id: csv:meraki-hostregistry.csv:wdc-wap-1
expires: 2026-06-15
rationale: WAPs covered by inventory but CSV hostname slug differs.
owner: rajesh.chhetry@greenpeace.us
The finding_id is the stable identifier printed by the coverage script —
copy it from the error or warning line.
Going deeper¶
Read the full
Component Coverage Standard on
infras.greenpeace.us for the schema, compliance mapping, and the
planned → live promotion workflow.