Skip to content

Forms 2.5(c) — Submission Routing Pipeline (Design)

Status: Decisions locked — ready for final read, then commit + code Intended location: mkdocs-portal/docs/architecture/forms-phase2.5c-design.md Version: v0.2 Date: 2026-06-04 Author: R. Chhetry / Claude (chat design pass) Grounded in: Code read-only investigation 2026-06-04 (routes_phase2.py, 001_init.sql, 002_rls.sql, gpus-reports/, live enum query). All schema/enum claims below are verified against the live DB unless marked [CODE-TIME CONFIRM].

Changelog: v0.2 — all §16 open questions resolved to recommended decisions (R.C. delegated, 2026-06-04). Transport = B-iii (MAPLE-resident). Attachments = signed URL. Interim body = minimal plaintext, not gated behind 2.5d. DB role = reuse forms_app. Audit = lean (4 new values). One code-time fact confirm remains (§3 wire mapping).


§1 Purpose

Routing is what happens to a submission after it is finalized and (if it has attachments) all attachments pass ClamAV scanning. A clean, complete submission must be delivered to one or more destinations defined per-form. 2.5(c) builds the dispatch abstraction: it reads the form's ordered actions rows, executes each one against the right destination, drives the submission's lifecycle state, and records the outcome.

2.5(c) is deliberately not Phase 3. Phase 3 is the HappyFox API client. 2.5(c) is the orchestration layer that HappyFox plugs into as one more action_type — the dispatch loop, idempotency, state machine, audit, retry, and DLQ are all built here once. Today's in-scope destination is email via the existing Postfix→Gmail relay. happyfox_template actions are recognised by the loop but their dispatch is a deferred stub until Phase 3.

Scope boundary with 2.5(d): 2.5(c) selects the template (actions.template_id) and passes it to the send step. 2.5(c) does not render bodies — that is 2.5(d). Decision: 2.5(c) email send is not gated behind 2.5(d). Until 2.5(d) ships, routed email uses a minimal plaintext body (§7) so the full pipeline is shippable and testable end-to-end now; 2.5(d) later swaps the body without touching routing.


§2 Architecture — service shape (DECIDED)

The investigation surfaced the constraint that drove this decision: the proven, zero-auth mail path is smtplib.SMTP("localhost", 25) against Postfix on MAPLE (report_mailer.py), and Postfix on MAPLE holds the Gmail SASL relay credentials. forms-backend runs on Cloud Run and cannot reach localhost:25 on MAPLE. So the choice is two stacked decisions — both now resolved.

Decision A — orchestration: Pub/Sub pull worker (your (c) instinct, kept). Claim row → iterate actions → drive state → audit, with ack-then-process and a stuck-row sweep, exactly parallel to α.

Decision B — transport: B-iii — the routing worker runs on MAPLE. It subscribes to the routing topic as a Pub/Sub pull subscriber (or, if pull proves awkward, a DB-poll daemon mirroring report_cron.sh) and sends via localhost:25 exactly like report_mailer.py.

Rationale (locked): B-iii reuses the single most load-bearing proven pattern — the exact mail path that already works — and adds zero new secret (no duplication of MAPLE's Gmail credential), keeps MAPLE's Postfix locked to localhost (no relay exposure to the VPC), and adds no new class of dependency (MAPLE is already critical-path: Wazuh, Prometheus, report cron, DB jump). The accepted cost: the worker is a MAPLE-resident systemd service rather than serverless, so it needs its own monitoring and the standard IR/DR/drill triad on MAPLE (tracked below).

Rejected alternative (for the record): B-i — a Cloud Run worker doing authenticated SMTP submission. Cleaner α parallel and fully serverless, but it duplicates MAPLE's Gmail credential into Secret Manager (double rotation burden) and adds an egress-auth surface. The zero-new-secret reuse argument won.


§3 Trigger — when routing fires

Readiness is: submission persisted AND (no attachments OR every attachment clamav_status='clean'). Attachments scan asynchronously via α, so readiness can be reached at two moments:

  • No attachments / already-clean at finalize: readiness passes inside finalize_submission.
  • Attachments still scanning at finalize: readiness is reached later, when α marks the last attachment clean.

Design: a single evaluate_routing_readiness(submission_id) predicate invoked from both seams — from finalize_submission after it persists, and from the α worker after it flips an attachment to clean. Whichever caller observes "all clean" publishes a routing Pub/Sub event. Double-publish is safe (idempotent claim, §9). Any attachment infected → submission goes to failed, never routes (IR per rb-008).

finalize_submission (routes_phase2.py:412–435) is currently a pure stub that persists nothing. So 2.5(c) also owns finalize's first real write: transition received → processing [CODE-TIME CONFIRM: actually finalize persists received-state then publishes; the worker performs the received→processing claim — keep the claim in the worker, §9], then publish readiness. The # TODO Phase 2 wire-up block (line 418) is the hook seam; _audit_v2() (line 75) and the Session(get_engine()) pattern already exist there.

[CODE-TIME CONFIRM (§16.5): _status_to_wire (routes_phase2.py:57) — read the mapping table and make finalize's returned wire status agree with the persisted enum value. The current stub returns wire "submitted", which has no DB enum equivalent; 2.5(c) standardises the wire vocabulary to map from the real status (received/processing/routed/failed). This is a one-line read at code time, not a design fork.]


§4 State machine

The submission_status enum already exists with exactly the values needed — no status enum extension required:

received → processing → routed
                      ↘ failed
(purged is terminal/retention, out of routing scope)
  • received — created/finalized, not yet dispatched.
  • processing — claimed by the routing worker (atomic transition, §9).
  • routed — all actions dispatched successfully.
  • failed — permanent failure, retries exhausted, or attachment infected.

Verified: enum values received, processing, routed, failed, purged (001_init.sql:34), live DB identical, no drift.


§5 Routing destinations

  • email (in scope) — action_type ∈ {email_template, email_raw}, sent via the §2 transport. Live data: email_template = 35 rows / 25 forms; email_raw = 0 rows (supported, not optimised for).
  • HappyFox (Phase 3) — action_type='happyfox_template'. Recognised by the loop; dispatch is a deferred stub that records happyfox_status='deferred' and does not fail the submission. Live data: 32 rows / 12 forms. happyfox_ticket_id/happyfox_status columns already exist for Phase-3 output.
  • dead-letter — persistent-failure path (§8), parallel to α's DLQ.

§6 Per-form routing config

Already modelled — no schema change to actions. actions (001_init.sql:153–166):

id BIGSERIAL PK
form_id          TEXT  → forms(id) ON DELETE CASCADE
action_order     INT   NOT NULL          -- execution sequence
action_type      enum  (happyfox_template | email_template | email_raw)
template_id      TEXT  → templates(id)   -- body template (2.5d renders it)
destination      TEXT  NOT NULL          -- recipient(s) for email
happyfox_category TEXT
subject_template TEXT
UNIQUE (form_id, action_order)

One row = one ordered routing step. Source of truth is YAML: yaml_loader.py does DELETE FROM actions WHERE form_id=… then bulk-inserts on form load — config edits flow through YAML reload (admin_reload_yaml audit action already exists), not direct DB writes. The routing loop reads via the Form.actions ORM relationship (models.py:43) ordered by action_order. No production code consumes actions for dispatch yet — that is what 2.5(c) adds.


§7 Email payload composition

Mirror report_mailer.py: MIMEMultipart("mixed"), sender gpus-it-security@greenpeace.org, recipients from actions.destination, subject from actions.subject_template. Body rendering is 2.5(d); interim body is minimal plaintext (submission id + form name + "details to follow") per the §1 decision.

Attachment handling — DECIDED: signed URL. Clean attachments live in GCS; the email body carries a time-limited GCS signed URL per attachment. This avoids large-MIME bounce risk and keeps the message light. Inline MIME attach is deferred as a possible per-form option later (would reuse 2.5b.cleanup's Config-level MIME/size authority). Signed-URL TTL [CODE-TIME: pick a sane TTL, e.g. 7d, aligned to retention_expires_at].


§8 Failure modes

Class Examples Handling
Permanent recipient rejected, MIME rejected, malformed action mark action failed in routing_result; do not retry
Transient relay temp error (4xx), rate limit, timeout retry per Pub/Sub redelivery
Infected attachment any attachment clamav_status='infected' submission → failed, no send, alert (rb-008)

Retry policy mirrors α: max 5 delivery attempts → DLQ → Slack alert to #us-soc-alerts. report_mailer.py has no retry/backoff (single try/except, sys.exit(1)), so retry semantics are new and live in the routing worker, not the send call.


§9 Idempotency

Atomic optimistic-lock claim, same as α:

UPDATE submissions SET status='processing', updated_at=NOW()
WHERE id = :sid AND status='received'
RETURNING id;

Zero rows returned ⇒ already claimed ⇒ ack and drop. One email per submission is guaranteed by the state gate, not the send call. Because forms_app RLS is USING(TRUE) WITH CHECK(TRUE) on submissions, this transition is NOT subject to a state-coverage RLS gate — unlike α, no per-state RLS enumeration is needed. Per-action idempotency within a submission is tracked in routing_result JSONB (which actions already succeeded), so a redelivery after partial success — including a sweep-reclaimed stale row (§11) — re-sends nothing already completed.


§10 Audit trail

audit_log is append-only (UPDATE/DELETE revoked from all roles). Every routing attempt logs via _audit_v2(). The live audit_action enum (26 values, repo == live, no drift) has happyfox_success/happyfox_failure but no email/routing values. The one schema change 2.5(c) needs is 011_routing_audit_actions.sql adding exactly four values (audit kept lean — DECIDED):

  • submission_routed — all actions dispatched OK
  • submission_route_failed — permanent failure / retries exhausted / infected
  • email_sent — per email action success
  • email_failed — per email action failure

Dropped (lean decision): submission_route_retried and submission_route_reset_stuck — Pub/Sub redelivery counts and the eventual terminal submission_routed/submission_route_failed row carry enough signal; discrete retry/reset audit rows would be noise. ALTER TYPE … ADD VALUE is safe (readonly RLS is allow-except-purged, so new values are auto-visible — the safe direction).


§11 Operational signals

  • Healthy: rows move received → processing → routed within seconds of readiness; failed rate near zero; DLQ empty.
  • Stuck: rows lingering in processing (worker died mid-dispatch). Recovery: a sweep tick parallel to α's /sweep-stuck — for processing rows older than N minutes, reset processing → received and re-publish a routing event. Safe to re-run because routing_result skips already-sent actions (§9). No discrete audit value for the reset (§10 lean decision).
  • Metrics surface on the SOC / status portal later (out of 2.5c scope; noted for the SOC Ticketing tab roadmap).

§12 Failure isolation

Per-submission (per Pub/Sub message) and per-action within a submission. One recipient's bounce must not block other submissions, nor other actions of the same submission — routing_result records each action's outcome independently; the submission lands routed only if all non-deferred actions succeeded, else failed with the partial result preserved.


§13a Access surface (7-point Cloud SQL spec) — resolved for B-iii + forms_app reuse

DECIDED: reuse the existing forms_app role; worker runs on MAPLE. This makes the access surface markedly lighter than α's.

  1. Project IAM roles — worker uses MAPLE's existing identity/path for DB access plus roles/pubsub.subscriber for the routing subscription. No new Cloud Run SA, no Secret Manager accessor (no new secret).
  2. Cloud SQL IAM userreuse forms_app (same trust domain as forms-backend). No new DB role. (A dedicated routing_app was considered for blast-radius isolation and rejected: it would reactivate the §13a.7 state-coverage work for marginal benefit.)
  3. Table GRANTs — already held by forms_app: SELECT/INSERT/UPDATE on submissions, submission_fields, attachments, audit_log (no DELETE). Nothing to add.
  4. USAGE on TYPEsubmission_status + audit_action enums; forms_app already has it; new 011 values inherit it.
  5. USAGE on SEQUENCEaudit_log id sequence; forms_app already granted (002_rls.sql).
  6. RLS policy coverageforms_app: submissions ALL USING(TRUE) WITH CHECK(TRUE); audit_log INSERT WITH CHECK(TRUE). All routing writes already covered. No new policy.
  7. RLS state coverageN/A by construction. USING(TRUE) permits every status the worker writes (processing, routed, failed). The α GATE-4 lesson is rendered moot by the existing permissive app-role policy and the decision to reuse it.

§14 Cost

B-iii is ~$0 incremental: existing MAPLE VM, existing Postfix, Pub/Sub pull within free tier at this volume. No new always-on infra, no new Cloud Run service. Target <$5/mo comfortably held.


§15 Reuse

Postfix→Gmail relay (sender gpus-it-security@greenpeace.org), report_mailer.py MIME composition, _audit_v2(), Session(get_engine()), α's Pub/Sub claim + sweep + DLQ + #us-soc-alerts Slack pattern, 2.5b.cleanup Config-level MIME/size authority, GCS signed-URL generation. Net new secrets: zero.


§16 Decisions (locked 2026-06-04)

  1. Transport / service shapeB-iii: routing worker runs on MAPLE as a Pub/Sub pull subscriber, sends via localhost:25 Postfix. Zero new secret. (B-i Cloud Run + authenticated SMTP rejected — secret duplication.)
  2. Attachment deliverysigned URL in the body (inline MIME deferred).
  3. Interim body before 2.5(d)minimal plaintext, not gated behind 2.5(d). Pipeline ships now.
  4. DB rolereuse forms_app (no routing_app). Keeps §13a.7 N/A.
  5. _status_to_wire mappingcode-time confirm (read routes_phase2.py:57, align wire vocab to the real enum). Not a design fork; the only fact left to verify before/at code start.
  6. Audit granularitylean: four new values (submission_routed, submission_route_failed, email_sent, email_failed); no retry/reset audit rows.

Implementation footprint (what code will touch, once committed)

  • forms-backend/schema/011_routing_audit_actions.sql — +4 enum values (the only migration).
  • forms-backend/routes_phase2.py — wire up finalize_submission (persist + evaluate_routing_readiness + publish); fix _status_to_wire per §16.5.
  • α worker — call evaluate_routing_readiness after marking last attachment clean (publish to routing topic).
  • New MAPLE-resident routing worker — Pub/Sub pull subscriber; claim/iterate-actions/dispatch/audit/sweep; sends via localhost:25 reusing report_mailer.py composition.
  • Pub/Sub: routing topic + pull subscription + DLQ + dead-letter alert to #us-soc-alerts.
  • Ops triad: IR runbook (rb-009 routing-failure?), DR entry, red/blue drill for the new MAPLE worker (per the every-new-asset rule).

Gate: design locked. Next step is your final read → Code commits this doc to mkdocs-portal/docs/architecture/forms-phase2.5c-design.md → then §1 code.