Skip to content

Phase 2.5(a) Design — Wire create_submission to persist

Status: Approved for implementation Author: Claude (Chat) + Rajesh Date: 2026-04-29 Scope: forms-backend/auth_v2.py username surface; routes_phase2.py:create_submission wire-up Out of scope: attachments (2.5b), finalize/routing (2.5c), HappyFox/email (2.5e/f), schema migration framework (2.6), RLS hardening, forms-backend/auth.py username extraction (already correct as of commit dd810e0)


Three commits in sequence

# Commit What Why first
1 auth_v2.py username surface Add username field to User dataclass; populate from preferred_username claim with fallback chain Foundation; Phase 2.5(a) handler needs user.username for allow-list checks
2 This design doc Capture Phase 2.5(a) reasoning in MkDocs Reference for the wire-up commit; discoverable by future developers
3 Wire-up routes_phase2.py:create_submission becomes a real handler Depends on commits 1+2

Background: token claims and the count-drift bug

Token-claim inspection on 2026-04-29 confirmed Okta ID tokens carry: - preferred_username: "rchhetry" (bare username — what forms.allow_users was migrated with) - email: "rchhetry@greenpeace.org" - sub: "00ub2f9tykQ2HysLF416" (Okta opaque ID)

A 2026-04-22 memo described a count-drift bug ("users see 27 forms instead of 28") caused by auth.py setting g.username = sub, which never matched the bare-username values in forms.allow_users (migrated from legacy MySQL as space-separated usernames like {rchhetry, tgarg, jdoe}).

That bug was fixed in commit dd810e0 on 2026-04-23, which added the preferred_username → email-prefix → sub fallback chain to auth.py. Phase 1's allow-list logic in routes/forms.py and routes/submissions.py has been working correctly for ~6 days as of this design.

The relevance to Phase 2.5(a): auth_v2.py was created later for Phase 2 work and was never updated to surface username the same way. Its User dataclass has email, name, sub, roles, okta_groups — no username. Phase 2.5(a) needs that field for its own allow-list comparison.

So this design fixes only the Phase 2 side. Phase 1 is already correct.


Commit 1: auth_v2.py username surface

Changes

forms-backend/auth_v2.py — extend the User dataclass to include username:

@dataclass
class User:
    sub: str
    email: str
    name: str
    username: str  # new: bare username, primarily for allow-list comparisons
    roles: list[str]
    okta_groups: list[str]

And in current_user() (or _user_from_claims(), whichever name the file uses):

return User(
    sub=claims["sub"],
    email=claims.get("email", ""),
    name=claims.get("name", ""),
    username=(
        claims.get("preferred_username")
        or claims.get("email", "").split("@")[0]
        or claims["sub"]
    ),
    roles=...,
    okta_groups=claims.get("groups", []),
)

The fallback chain mirrors auth.py's pattern from dd810e0: 1. preferred_username claim (correct path; verified present in production tokens) 2. Email-prefix (legacy compatibility for tokens missing the claim) 3. sub claim (final fallback; matches what unfixed code would do)

Why this is safe

  • No data migration. No existing data structure changes; the dataclass just gains a field.
  • No Phase 1 impact. auth.py and auth_v2.py are independent; Phase 1 routes don't import from auth_v2.
  • No SPA impact. The wire shape that the SPA reads doesn't change; this is internal handler plumbing.
  • Fallback chain protects against token drift. If a future Okta config change drops preferred_username, the email-prefix fallback keeps the system functional.

Verification

Phase 2.5(a) (Commit 3) is the actual consumer of user.username. The allow-list test described in this doc's Verification section validates that the field is being populated and used correctly.


Commit 2: this design doc

Path: mkdocs-portal/docs/architecture/forms-phase2.5a-design.md. Adds a nav entry under "Architecture" in mkdocs.yml.


Commit 3: routes_phase2.py wire-up

What this commit changes

One file primarily: forms-backend/routes_phase2.py. The existing stub at create_submission (line 102-ish, ~25 lines) is replaced with a working handler (~80 lines). One model file gets a one-line addition (created_at on Submission).

No schema migrations. No new tables. No new dependencies. No frontend changes.

What stays as a stub

  • POST /api/submissions/<id>/attachments (Phase 2.5b)
  • POST /api/submissions/<id>/submit (Phase 2.5c)

These remain in their current stub form. The SPA still navigates through the three-phase stub flow — but submissions actually persist now.

Reference implementation

Phase 1's routes/submissions.py:submit() (lines 37–109) is the working pattern. Phase 2.5(a) mirrors it with three substantive changes:

  1. Auth source: auth_v2.current_user() instead of Flask g.email/g.username
  2. Status: persist as received (DB enum) but return draft (contract enum) via wire-layer mapping
  3. Response shape: CreateSubmissionResponse from contract.ts (id, form_id, status, created_at, created_by) instead of Phase 1's {submission_id, status}

Everything else mirrors Phase 1: form lookup, allow-list check (now using user.username from Commit 1), required-field validation, KMS envelope encryption via crypto.py, per-field row inserts, audit log, no SET LOCAL for RLS.

Detailed handler logic

@bp.post("/submissions")
@require_role("submitter")
def create_submission():
    """Create a new submission in 'received' state (wire shape: 'draft').

    Mirrors Phase 1's submit() at routes/submissions.py:37 with auth_v2 +
    Phase 2 contract response shape.
    """
    user = current_user()
    body = request.get_json(silent=True) or {}
    form_id = body.get("form_id")
    fields_in = body.get("fields") or {}

    # Validation: matches Phase 1
    if not form_id or not isinstance(fields_in, dict):
        return jsonify(error="invalid_request"), 400

    with Session(get_engine()) as session:
        form = session.get(Form, form_id)
        if form is None or not form.is_active:
            return jsonify(error="form_not_found"), 404

        # Allow-list check (mirrors Phase 1 line 50-55, using user.username
        # from auth_v2 Commit 1)
        if form.allow_users:
            user_un_lower = user.username.lower()
            allowed_lower = [u.lower() for u in form.allow_users]
            if user_un_lower not in allowed_lower:
                _audit_v2(session, "auth_failure",
                          actor_username=user.username,
                          target_id=form_id, target_type="form",
                          details={"reason": "not_in_allow_list"},
                          success=False)
                session.commit()
                return jsonify(error="forbidden"), 403

        # Required-field check (mirrors Phase 1 line 64)
        missing = [
            f.field_key for f in form.fields
            if f.required and f.field_key not in fields_in
            and f.field_type not in _SKIP_FIELD_TYPES
        ]
        if missing:
            return jsonify(error="missing_required_fields",
                           missing=missing), 422

        # Encryption setup (mirrors Phase 1 line 75)
        dek, wrapped, key_version = new_submission_dek()

        # Build submission first (with empty searchable_values placeholder)
        submission = Submission(
            form_id=form.id,
            form_version=form.current_version,
            submitter_email=user.email,
            submitter_username=user.username,  # Decision 2c
            retention_expires_at=default_retention_expires_at(form.retention_days),
            status="received",  # Decision 1: persisted as 'received'
            ip_address=request.remote_addr,
            user_agent_hash=_hash_user_agent(request.headers.get("User-Agent", "")),
            searchable_values={},
        )
        session.add(submission)
        session.flush()  # populates submission.id from server default

        # Encrypt + persist each field, building searchable_values along the way
        searchable_values = {}
        for f in form.fields:
            if f.field_type in _SKIP_FIELD_TYPES:
                continue
            value = fields_in.get(f.field_key)
            if value is None:
                continue
            value_str = str(value)
            enc = encrypt_field(value_str, dek, wrapped, key_version)
            session.add(SubmissionField(
                submission_id=submission.id,
                field_key=f.field_key,
                field_type=f.field_type,
                value_encrypted=enc.ciphertext,
                nonce=enc.nonce,
                dek_wrapped=enc.dek_wrapped,
                kms_key_version=enc.kms_key_version,
            ))
            if f.searchable:
                searchable_values[f.field_key] = value_str

        submission.searchable_values = searchable_values

        _audit_v2(session, "submission_created",
                  actor_username=user.username,
                  target_id=str(submission.id), target_type="submission",
                  details={"form_id": form.id})

        session.commit()
        session.refresh(submission)  # populate created_at from server default

        return jsonify({
            "id": str(submission.id),
            "form_id": submission.form_id,
            "status": _status_to_wire(submission.status),  # received → "draft"
            "created_at": submission.created_at.isoformat(),
            "created_by": submission.submitter_email,
        }), 201

Helper functions (new, in routes_phase2.py)

_SKIP_FIELD_TYPES = {"divider", "instructions", "action", "custom_subject"}

def _status_to_wire(db_status: str) -> str:
    """Map DB submission_status enum to contract SubmissionStatus.

    DB has 5 values (received, processing, routed, failed, purged);
    contract has 2 (draft, submitted). Mapping:
      received  → draft       (just created, not finalized)
      processing → draft      (mid-flow, still draft from user POV)
      routed    → submitted   (delivered to destination)
      failed    → submitted   (lifecycle ended; user sees as submitted)
      purged    → submitted   (lifecycle ended; user shouldn't see this)

    This mapping is a wire-layer translation. Future Phase 2.6 work may
    introduce a schema migration framework and ALTER TYPE the enum to
    match the contract directly, retiring this helper.
    """
    return "draft" if db_status in ("received", "processing") else "submitted"

def _audit_v2(session, action: str, *, actor_username: str | None = None,
              target_id: str | None = None, target_type: str | None = None,
              details: dict | None = None, success: bool = True):
    """Phase 2 audit log helper. Mirrors Phase 1's _audit
    (routes/submissions.py:22) exactly, but takes actor_username as
    a kwarg since auth_v2 doesn't populate Flask g."""
    session.add(AuditLog(
        actor_username=actor_username,
        actor_ip=request.remote_addr,
        action=action,
        target_type=target_type,
        target_id=target_id,
        details=details or {},
        request_id=request.headers.get("X-Request-ID"),
        success=success,
    ))

def _hash_user_agent(ua: str) -> str:
    """SHA-256 hex digest of UA string. Matches Phase 1 (routes/submissions.py:99)."""
    return hashlib.sha256(ua.encode("utf-8")).hexdigest()

Model change (one line)

In forms-backend/models.py, add to the Submission class:

created_at: Mapped[datetime] = mapped_column(
    DateTime(timezone=True), server_default=text("NOW()"), nullable=False
)

The schema column already exists (schema/001_init.sql:120); this just declares it on the ORM model so the response can read it.


Decisions baked in

# Decision Choice
1 Status enum mapping 1b — wire-layer translation via _status_to_wire() helper
2 Username source 2cpreferred_username from token claims, fallback to email-prefix then sub
3 created_at source 3a — add created_at to Submission model, use in response
RLS Set session vars? No — RLS is inert; mirror Phase 1's pattern (no SET LOCAL)
Migration Schema changes in 2.5(a)? No — wire-up only, existing tables, no DDL

What we're NOT doing

Item Why deferred
Schema migration for new enum values Per Decision 1b — wire-layer mapping for now; revisit when migration framework lands
Setting SET LOCAL app.current_username RLS is inert (USING TRUE WITH CHECK TRUE); session vars unused
Phase 2.5(b) attachment wire-up Separate commit; needs GCS + ClamAV plumbing
Phase 2.5(c) finalize wire-up Separate commit; needs HappyFox client + template render
Migrating Phase 1 routes to auth_v2 Out of scope; Phase 1 routes already use auth.py correctly post-dd810e0
RLS hardening (FORCE ROW LEVEL SECURITY + real policies) Documented-but-inert finding; track separately

Risk assessment

Risk Severity Mitigation
Wire-layer status mapping diverges from real lifecycle in future Low One helper function; all future readers will use it
username from preferred_username differs from sub in audit logs Low Documented in commit message; existing entries traceable via okta_sub field if needed
Token missing preferred_username (e.g., dev tokens, future Okta config drift) Low Two fallbacks (email-prefix, sub) prevent crash; behavior degrades to current state
created_at model field doesn't match server-generated value Low session.refresh(submission) after commit ensures the response reads the actual server timestamp
Phase 1's existing Phase 1 submissions have submitter_username=<sub> for any made before dd810e0 Very low Cosmetic inconsistency in older rows; could backfill if it becomes annoying

Verification plan

After Commit 1 deploys:

  1. Sign in to forms.greenpeace.us
  2. Note the form count in FormPicker (expected: 28; if still 27, Phase 1 has a separate issue worth investigating but it does not gate Phase 2.5(a))

After Commit 3 deploys:

  1. Unauthenticated POST → 401 (auth_v2's @require_role rejects).
  2. Authenticated POST with bogus form_id → 404.
  3. Authenticated POST with missing required field → 422 with missing=[...].
  4. Authenticated POST with form whose allow_users doesn't include user → 403.
  5. Authenticated POST with valid data → 201 with response shape:
    {
      "id": "<uuid>",
      "form_id": "<form-id>",
      "status": "draft",
      "created_at": "2026-04-29T...",
      "created_by": "rchhetry@greenpeace.org"
    }
    
  6. Inspect DB (when MAPLE access exists): row in submissions table with status='received', submitter_username='rchhetry', matching rows in submission_fields with encrypted ciphertext (not plaintext), audit log row with action='submission_created'.
  7. End-to-end browser test: SPA submit on Contract Extension Notification — UI flow unchanged, but database verifies the actual write.

Step 6 is gated on MAPLE SSH access being established. Steps 1–5 + 7 are immediately verifiable.

Open questions surfaced by this design

  1. submitter_username in existing pre-dd810e0 submissions: ~4 Phase 1 submissions made before the auth fix may have submitter_username=<okta-sub>. Cosmetic only, could backfill in Phase 2.6 cleanup if desired.
  2. Migration framework: Phase 2.6 should establish one (Alembic recommended) before more schema changes accumulate.
  3. RLS hardening: should backend connection switch from cloudsqlsuperuser to a least-privilege role? Currently bypasses all RLS by accident.

These are tracked items, not blockers for 2.5(a).

Process note: how this design changed

This document went through two revisions during the design pass:

  • Initial draft (Decision 2a): proposed using email as submitter_username, on the assumption that auth_v2's User dataclass had no username field and adding one was unnecessary scope.
  • First revision (Decision 2c, dual auth fix): after learning forms.allow_users contains bare usernames, expanded scope to fix both auth.py and auth_v2.py. Premise: the count-drift bug was still active.
  • Final revision (Decision 2c, auth_v2 only): Code surfaced that auth.py had been fixed in commit dd810e0 six days prior; the count-drift memo from 2026-04-22 was stale. Scope contracted to auth_v2.py only.

The lesson worth pinning: memory entries describing "current bugs" age at the rate of commits to the affected modules. A diagnostic memo from a week ago described real state at the time it was written, not current state. Read-first discipline includes reading current code, not just current memory.

Process note: corrections after Code review

This section was added 2026-05-01 to document four corrections that landed in shipped code (commit 83f85cb) but weren't reflected in this doc's original publication (commit 10bcd0d, 2026-04-30). The corrections are now applied above; this note explains what changed and why.

Code's "STOP and tell me if anything doesn't fit cleanly" gate during Commit 3 verification caught three substantive design-vs-reality bugs plus one consistency issue:

  1. _audit_v2 helper signature — design specified actor=actor kwarg with **fields catch-all. Reality: AuditLog model expects actor_username (not actor), and Phase 1's _audit populates four other fields (actor_ip, target_type, target_id, request_id, success) that the design's signature dropped. Calling the design's helper would have raised TypeError on first invocation.

  2. Allow-list deny audit action — design used "submission_denied" as the audit action string. Reality: the audit_action enum in schema/001_init.sql does not contain that value. Phase 1 uses "auth_failure" with details={"reason": "not_in_allow_list"} and success=False for the same situation. The design's value would have been rejected at Postgres ENUM check on first deny.

  3. SubmissionField attachment pattern — design used row.submission = submission to attach SubmissionField rows to the parent Submission. Reality: neither model declares a SQLAlchemy relationship, so this assignment would have set a plain Python attribute that the ORM doesn't know about — the FK column submission_id would have stayed NULL, and session.add() on the parent wouldn't have cascaded the field rows. Net effect: submissions would have persisted with zero field rows, the most catastrophic possible failure mode (response says "201 created with N fields encrypted" but the field rows are missing). Phase 1's pattern uses session.flush() to get the auto-generated UUID, then constructs each field with submission_id=submission.id explicitly.

  4. Internal consistency — the second _audit_v2 call site (submission_created) used the same wrong actor= kwarg as the deny path. Fixed in tandem with #1 and #2.

All four corrections were applied to shipped code in commit 83f85cb. This doc was committed earlier (10bcd0d) with the wrong specs, then corrected here on 2026-05-01.

Lesson

The three substantive failures shared a common root: I drafted the design from memory of "Phase 1 has a working _audit helper" without re-reading what Phase 1's helper actually does at the line number the design claims to mirror.

Read-first discipline doesn't end at "read the file once during investigation." It applies again at every implementation moment that references that file's content. When mirroring a pattern, re-read the pattern at draft time, not from memory of an earlier read.