Phase 2.5(a) Design — Wire create_submission to persist¶
Status: Approved for implementation
Author: Claude (Chat) + Rajesh
Date: 2026-04-29
Scope: forms-backend/auth_v2.py username surface; routes_phase2.py:create_submission wire-up
Out of scope: attachments (2.5b), finalize/routing (2.5c), HappyFox/email (2.5e/f), schema migration framework (2.6), RLS hardening, forms-backend/auth.py username extraction (already correct as of commit dd810e0)
Three commits in sequence¶
| # | Commit | What | Why first |
|---|---|---|---|
| 1 | auth_v2.py username surface |
Add username field to User dataclass; populate from preferred_username claim with fallback chain |
Foundation; Phase 2.5(a) handler needs user.username for allow-list checks |
| 2 | This design doc | Capture Phase 2.5(a) reasoning in MkDocs | Reference for the wire-up commit; discoverable by future developers |
| 3 | Wire-up | routes_phase2.py:create_submission becomes a real handler |
Depends on commits 1+2 |
Background: token claims and the count-drift bug¶
Token-claim inspection on 2026-04-29 confirmed Okta ID tokens carry:
- preferred_username: "rchhetry" (bare username — what forms.allow_users was migrated with)
- email: "rchhetry@greenpeace.org"
- sub: "00ub2f9tykQ2HysLF416" (Okta opaque ID)
A 2026-04-22 memo described a count-drift bug ("users see 27 forms instead of 28") caused by auth.py setting g.username = sub, which never matched the bare-username values in forms.allow_users (migrated from legacy MySQL as space-separated usernames like {rchhetry, tgarg, jdoe}).
That bug was fixed in commit dd810e0 on 2026-04-23, which added the preferred_username → email-prefix → sub fallback chain to auth.py. Phase 1's allow-list logic in routes/forms.py and routes/submissions.py has been working correctly for ~6 days as of this design.
The relevance to Phase 2.5(a): auth_v2.py was created later for Phase 2 work and was never updated to surface username the same way. Its User dataclass has email, name, sub, roles, okta_groups — no username. Phase 2.5(a) needs that field for its own allow-list comparison.
So this design fixes only the Phase 2 side. Phase 1 is already correct.
Commit 1: auth_v2.py username surface¶
Changes¶
forms-backend/auth_v2.py — extend the User dataclass to include username:
@dataclass
class User:
sub: str
email: str
name: str
username: str # new: bare username, primarily for allow-list comparisons
roles: list[str]
okta_groups: list[str]
And in current_user() (or _user_from_claims(), whichever name the file uses):
return User(
sub=claims["sub"],
email=claims.get("email", ""),
name=claims.get("name", ""),
username=(
claims.get("preferred_username")
or claims.get("email", "").split("@")[0]
or claims["sub"]
),
roles=...,
okta_groups=claims.get("groups", []),
)
The fallback chain mirrors auth.py's pattern from dd810e0:
1. preferred_username claim (correct path; verified present in production tokens)
2. Email-prefix (legacy compatibility for tokens missing the claim)
3. sub claim (final fallback; matches what unfixed code would do)
Why this is safe¶
- No data migration. No existing data structure changes; the dataclass just gains a field.
- No Phase 1 impact.
auth.pyandauth_v2.pyare independent; Phase 1 routes don't import fromauth_v2. - No SPA impact. The wire shape that the SPA reads doesn't change; this is internal handler plumbing.
- Fallback chain protects against token drift. If a future Okta config change drops
preferred_username, the email-prefix fallback keeps the system functional.
Verification¶
Phase 2.5(a) (Commit 3) is the actual consumer of user.username. The allow-list test described in this doc's Verification section validates that the field is being populated and used correctly.
Commit 2: this design doc¶
Path: mkdocs-portal/docs/architecture/forms-phase2.5a-design.md. Adds a nav entry under "Architecture" in mkdocs.yml.
Commit 3: routes_phase2.py wire-up¶
What this commit changes¶
One file primarily: forms-backend/routes_phase2.py. The existing stub at create_submission (line 102-ish, ~25 lines) is replaced with a working handler (~80 lines). One model file gets a one-line addition (created_at on Submission).
No schema migrations. No new tables. No new dependencies. No frontend changes.
What stays as a stub¶
POST /api/submissions/<id>/attachments(Phase 2.5b)POST /api/submissions/<id>/submit(Phase 2.5c)
These remain in their current stub form. The SPA still navigates through the three-phase stub flow — but submissions actually persist now.
Reference implementation¶
Phase 1's routes/submissions.py:submit() (lines 37–109) is the working pattern. Phase 2.5(a) mirrors it with three substantive changes:
- Auth source:
auth_v2.current_user()instead of Flaskg.email/g.username - Status: persist as
received(DB enum) but returndraft(contract enum) via wire-layer mapping - Response shape:
CreateSubmissionResponsefromcontract.ts(id, form_id, status, created_at, created_by) instead of Phase 1's{submission_id, status}
Everything else mirrors Phase 1: form lookup, allow-list check (now using user.username from Commit 1), required-field validation, KMS envelope encryption via crypto.py, per-field row inserts, audit log, no SET LOCAL for RLS.
Detailed handler logic¶
@bp.post("/submissions")
@require_role("submitter")
def create_submission():
"""Create a new submission in 'received' state (wire shape: 'draft').
Mirrors Phase 1's submit() at routes/submissions.py:37 with auth_v2 +
Phase 2 contract response shape.
"""
user = current_user()
body = request.get_json(silent=True) or {}
form_id = body.get("form_id")
fields_in = body.get("fields") or {}
# Validation: matches Phase 1
if not form_id or not isinstance(fields_in, dict):
return jsonify(error="invalid_request"), 400
with Session(get_engine()) as session:
form = session.get(Form, form_id)
if form is None or not form.is_active:
return jsonify(error="form_not_found"), 404
# Allow-list check (mirrors Phase 1 line 50-55, using user.username
# from auth_v2 Commit 1)
if form.allow_users:
user_un_lower = user.username.lower()
allowed_lower = [u.lower() for u in form.allow_users]
if user_un_lower not in allowed_lower:
_audit_v2(session, "auth_failure",
actor_username=user.username,
target_id=form_id, target_type="form",
details={"reason": "not_in_allow_list"},
success=False)
session.commit()
return jsonify(error="forbidden"), 403
# Required-field check (mirrors Phase 1 line 64)
missing = [
f.field_key for f in form.fields
if f.required and f.field_key not in fields_in
and f.field_type not in _SKIP_FIELD_TYPES
]
if missing:
return jsonify(error="missing_required_fields",
missing=missing), 422
# Encryption setup (mirrors Phase 1 line 75)
dek, wrapped, key_version = new_submission_dek()
# Build submission first (with empty searchable_values placeholder)
submission = Submission(
form_id=form.id,
form_version=form.current_version,
submitter_email=user.email,
submitter_username=user.username, # Decision 2c
retention_expires_at=default_retention_expires_at(form.retention_days),
status="received", # Decision 1: persisted as 'received'
ip_address=request.remote_addr,
user_agent_hash=_hash_user_agent(request.headers.get("User-Agent", "")),
searchable_values={},
)
session.add(submission)
session.flush() # populates submission.id from server default
# Encrypt + persist each field, building searchable_values along the way
searchable_values = {}
for f in form.fields:
if f.field_type in _SKIP_FIELD_TYPES:
continue
value = fields_in.get(f.field_key)
if value is None:
continue
value_str = str(value)
enc = encrypt_field(value_str, dek, wrapped, key_version)
session.add(SubmissionField(
submission_id=submission.id,
field_key=f.field_key,
field_type=f.field_type,
value_encrypted=enc.ciphertext,
nonce=enc.nonce,
dek_wrapped=enc.dek_wrapped,
kms_key_version=enc.kms_key_version,
))
if f.searchable:
searchable_values[f.field_key] = value_str
submission.searchable_values = searchable_values
_audit_v2(session, "submission_created",
actor_username=user.username,
target_id=str(submission.id), target_type="submission",
details={"form_id": form.id})
session.commit()
session.refresh(submission) # populate created_at from server default
return jsonify({
"id": str(submission.id),
"form_id": submission.form_id,
"status": _status_to_wire(submission.status), # received → "draft"
"created_at": submission.created_at.isoformat(),
"created_by": submission.submitter_email,
}), 201
Helper functions (new, in routes_phase2.py)¶
_SKIP_FIELD_TYPES = {"divider", "instructions", "action", "custom_subject"}
def _status_to_wire(db_status: str) -> str:
"""Map DB submission_status enum to contract SubmissionStatus.
DB has 5 values (received, processing, routed, failed, purged);
contract has 2 (draft, submitted). Mapping:
received → draft (just created, not finalized)
processing → draft (mid-flow, still draft from user POV)
routed → submitted (delivered to destination)
failed → submitted (lifecycle ended; user sees as submitted)
purged → submitted (lifecycle ended; user shouldn't see this)
This mapping is a wire-layer translation. Future Phase 2.6 work may
introduce a schema migration framework and ALTER TYPE the enum to
match the contract directly, retiring this helper.
"""
return "draft" if db_status in ("received", "processing") else "submitted"
def _audit_v2(session, action: str, *, actor_username: str | None = None,
target_id: str | None = None, target_type: str | None = None,
details: dict | None = None, success: bool = True):
"""Phase 2 audit log helper. Mirrors Phase 1's _audit
(routes/submissions.py:22) exactly, but takes actor_username as
a kwarg since auth_v2 doesn't populate Flask g."""
session.add(AuditLog(
actor_username=actor_username,
actor_ip=request.remote_addr,
action=action,
target_type=target_type,
target_id=target_id,
details=details or {},
request_id=request.headers.get("X-Request-ID"),
success=success,
))
def _hash_user_agent(ua: str) -> str:
"""SHA-256 hex digest of UA string. Matches Phase 1 (routes/submissions.py:99)."""
return hashlib.sha256(ua.encode("utf-8")).hexdigest()
Model change (one line)¶
In forms-backend/models.py, add to the Submission class:
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), server_default=text("NOW()"), nullable=False
)
The schema column already exists (schema/001_init.sql:120); this just declares it on the ORM model so the response can read it.
Decisions baked in¶
| # | Decision | Choice |
|---|---|---|
| 1 | Status enum mapping | 1b — wire-layer translation via _status_to_wire() helper |
| 2 | Username source | 2c — preferred_username from token claims, fallback to email-prefix then sub |
| 3 | created_at source | 3a — add created_at to Submission model, use in response |
| RLS | Set session vars? | No — RLS is inert; mirror Phase 1's pattern (no SET LOCAL) |
| Migration | Schema changes in 2.5(a)? | No — wire-up only, existing tables, no DDL |
What we're NOT doing¶
| Item | Why deferred |
|---|---|
| Schema migration for new enum values | Per Decision 1b — wire-layer mapping for now; revisit when migration framework lands |
Setting SET LOCAL app.current_username |
RLS is inert (USING TRUE WITH CHECK TRUE); session vars unused |
| Phase 2.5(b) attachment wire-up | Separate commit; needs GCS + ClamAV plumbing |
| Phase 2.5(c) finalize wire-up | Separate commit; needs HappyFox client + template render |
| Migrating Phase 1 routes to auth_v2 | Out of scope; Phase 1 routes already use auth.py correctly post-dd810e0 |
| RLS hardening (FORCE ROW LEVEL SECURITY + real policies) | Documented-but-inert finding; track separately |
Risk assessment¶
| Risk | Severity | Mitigation |
|---|---|---|
| Wire-layer status mapping diverges from real lifecycle in future | Low | One helper function; all future readers will use it |
username from preferred_username differs from sub in audit logs |
Low | Documented in commit message; existing entries traceable via okta_sub field if needed |
Token missing preferred_username (e.g., dev tokens, future Okta config drift) |
Low | Two fallbacks (email-prefix, sub) prevent crash; behavior degrades to current state |
created_at model field doesn't match server-generated value |
Low | session.refresh(submission) after commit ensures the response reads the actual server timestamp |
Phase 1's existing Phase 1 submissions have submitter_username=<sub> for any made before dd810e0 |
Very low | Cosmetic inconsistency in older rows; could backfill if it becomes annoying |
Verification plan¶
After Commit 1 deploys:
- Sign in to forms.greenpeace.us
- Note the form count in FormPicker (expected: 28; if still 27, Phase 1 has a separate issue worth investigating but it does not gate Phase 2.5(a))
After Commit 3 deploys:
- Unauthenticated POST → 401 (auth_v2's
@require_rolerejects). - Authenticated POST with bogus form_id → 404.
- Authenticated POST with missing required field → 422 with
missing=[...]. - Authenticated POST with form whose allow_users doesn't include user → 403.
- Authenticated POST with valid data → 201 with response shape:
- Inspect DB (when MAPLE access exists): row in
submissionstable withstatus='received',submitter_username='rchhetry', matching rows insubmission_fieldswith encrypted ciphertext (not plaintext), audit log row withaction='submission_created'. - End-to-end browser test: SPA submit on Contract Extension Notification — UI flow unchanged, but database verifies the actual write.
Step 6 is gated on MAPLE SSH access being established. Steps 1–5 + 7 are immediately verifiable.
Open questions surfaced by this design¶
submitter_usernamein existing pre-dd810e0submissions: ~4 Phase 1 submissions made before the auth fix may havesubmitter_username=<okta-sub>. Cosmetic only, could backfill in Phase 2.6 cleanup if desired.- Migration framework: Phase 2.6 should establish one (Alembic recommended) before more schema changes accumulate.
- RLS hardening: should backend connection switch from
cloudsqlsuperuserto a least-privilege role? Currently bypasses all RLS by accident.
These are tracked items, not blockers for 2.5(a).
Process note: how this design changed¶
This document went through two revisions during the design pass:
- Initial draft (Decision 2a): proposed using email as
submitter_username, on the assumption that auth_v2's User dataclass had nousernamefield and adding one was unnecessary scope. - First revision (Decision 2c, dual auth fix): after learning forms.allow_users contains bare usernames, expanded scope to fix both
auth.pyandauth_v2.py. Premise: the count-drift bug was still active. - Final revision (Decision 2c, auth_v2 only): Code surfaced that
auth.pyhad been fixed in commitdd810e0six days prior; the count-drift memo from 2026-04-22 was stale. Scope contracted toauth_v2.pyonly.
The lesson worth pinning: memory entries describing "current bugs" age at the rate of commits to the affected modules. A diagnostic memo from a week ago described real state at the time it was written, not current state. Read-first discipline includes reading current code, not just current memory.
Process note: corrections after Code review¶
This section was added 2026-05-01 to document four corrections that
landed in shipped code (commit 83f85cb) but weren't reflected in
this doc's original publication (commit 10bcd0d, 2026-04-30). The
corrections are now applied above; this note explains what changed
and why.
Code's "STOP and tell me if anything doesn't fit cleanly" gate during Commit 3 verification caught three substantive design-vs-reality bugs plus one consistency issue:
-
_audit_v2helper signature — design specifiedactor=actorkwarg with**fieldscatch-all. Reality:AuditLogmodel expectsactor_username(notactor), and Phase 1's_auditpopulates four other fields (actor_ip,target_type,target_id,request_id,success) that the design's signature dropped. Calling the design's helper would have raisedTypeErroron first invocation. -
Allow-list deny audit action — design used
"submission_denied"as the audit action string. Reality: theaudit_actionenum inschema/001_init.sqldoes not contain that value. Phase 1 uses"auth_failure"withdetails={"reason": "not_in_allow_list"}andsuccess=Falsefor the same situation. The design's value would have been rejected at Postgres ENUM check on first deny. -
SubmissionField attachment pattern — design used
row.submission = submissionto attachSubmissionFieldrows to the parentSubmission. Reality: neither model declares a SQLAlchemy relationship, so this assignment would have set a plain Python attribute that the ORM doesn't know about — the FK columnsubmission_idwould have stayed NULL, andsession.add()on the parent wouldn't have cascaded the field rows. Net effect: submissions would have persisted with zero field rows, the most catastrophic possible failure mode (response says "201 created with N fields encrypted" but the field rows are missing). Phase 1's pattern usessession.flush()to get the auto-generated UUID, then constructs each field withsubmission_id=submission.idexplicitly. -
Internal consistency — the second
_audit_v2call site (submission_created) used the same wrongactor=kwarg as the deny path. Fixed in tandem with #1 and #2.
All four corrections were applied to shipped code in commit 83f85cb.
This doc was committed earlier (10bcd0d) with the wrong specs, then
corrected here on 2026-05-01.
Lesson¶
The three substantive failures shared a common root: I drafted the design from memory of "Phase 1 has a working _audit helper" without re-reading what Phase 1's helper actually does at the line number the design claims to mirror.
Read-first discipline doesn't end at "read the file once during investigation." It applies again at every implementation moment that references that file's content. When mirroring a pattern, re-read the pattern at draft time, not from memory of an earlier read.