Showing all phase content; selected phase is highlighted in section badges.

Architecture

As-of: 2026-05-09. Source: passkey-shell repo at staging head, manifest.yaml, appsettings-staging.json.


v1 — Current Production Candidate

Component Diagram

┌─────────────────────────────────────────────────────────────────────┐
│  Azure Resource Group: rg-passkey-stg                               │
│                                                                     │
│  ┌──────────────────────────────────┐    ┌────────────────────────┐ │
│  │  App Service (B1 Linux)          │    │  Postgres Flexible     │ │
│  │  app-passkey-stg-ben-6b2f        │    │  Server (Burstable     │ │
│  │                                  │◄──►│  B1ms)                 │ │
│  │  Node 22 + Express               │    │  pg-passkey-stg-ben-   │ │
│  │  ├─ backend/dist/server.js       │    │  6b2f                  │ │
│  │  ├─ frontend/dist/ (static)      │    └────────────────────────┘ │
│  │  └─ Prisma ORM                   │                               │
│  └──────────┬───────────┬───────────┘    ┌────────────────────────┐ │
│             │           │                │  Key Vault             │ │
│             │           │                │  kv-pk-stg-ben-6b2f    │ │
│             │           └───────────────►│  - DATABASE_URL        │ │
│             │                            │  - KSM_CONFIG          │ │
│             │                            │  - ENTRA_TENANT_ID     │ │
│             ▼                            └────────────────────────┘ │
│  ┌──────────────────────┐                                           │
│  │  Application Insights│                                           │
│  │  (telemetry sink)    │                                           │
│  └──────────────────────┘                                           │
└─────────────────────────────────────────────────────────────────────┘
         │                    │
         ▼                    ▼
┌─────────────────┐   ┌─────────────────────────────┐
│  Keeper Vault    │   │  Microsoft Entra ID          │
│  (external)      │   │  Tenant: 92023f7c-...        │
│                  │   │  App: Passkey Portal Staging  │
│  KSM SDK ← reads│   │  (9b686ae7-...)               │
│  Commander CLI   │   └─────────────────────────────┘
│    ← shares      │
│    (stubbed in   │   ┌─────────────────────────────┐
│     staging)     │   │  Microsoft Teams             │
└─────────────────┘   │  Bot Framework (not          │
                      │  registered — skeleton only)  │
                      └─────────────────────────────┘

Data Flow — Credential Request Happy Path

The end-to-end sequence for a credential request in v1:

1. Authentication. User hits the React frontend, which redirects to Entra ID for OAuth. The backend validates the JWT (identity.service.ts) against the configured tenant, audience, and JWKS endpoint. On success, the user is looked up by entraObjectId in Postgres and a UserIdentity object is hydrated with role + group memberships. Permissions are resolved from Vault folder ACLs via permission.service.ts and cached for 5 minutes in UserPermissionCache.

2. Record visibility. The user sees only records they have Vault folder visibility for. GovernedResource entries link each Record to visibility, access, and lease policies. The visibility policy (system.visibility.vault-access) gates which records appear in the UI.

3. Request submission. The user selects a record, provides a justification and requested duration, and submits a request. The backend:

  • Runs the policy engine (policy/engine.ts) — a pure-function evaluator that applies rules in sequence: self-approval block, sensitivity escalation, visibility check, duration caps, request-state validation, authority routing.
  • The engine returns one of: AUTO_APPROVE, DENY, ROUTE (to a specific approver), or REQUIRES_TRIAGE.
  • A GovernanceDecisionTrace row is written with the full rule evaluation trace, policy version, inputs hash, and evaluating code version.

4. Approval. If routed, the designated approver (resolved from GovernedResourceAuthority mappings) reviews and approves or denies. Approval triggers lease.service.startLease():

  • Persists APPROVED status in a transaction.
  • Generates a cryptographic issuance token (32 bytes, SHA-256 hashed for storage — the plaintext is returned once to the requester).
  • Attempts sync issuance: calls vaultShareService.createOneTimeShare(uid, ttlMinutes).
    • On success: Status transitions to ISSUED. An IssuanceEvent row records the SHA-256 of the share URL (never the URL itself — INV-5), the Vault share ID, IP, and user agent.
    • On terminal error (permission denied, unauthorized): Status transitions to UNFULFILLABLE.
    • On transient error (timeout, rate limit): Status stays APPROVED; the issuance-retry.job picks it up with retry budget N=3, backoff [30s, 1m, 2m, 4m].
  • Audit event (LEASE_STARTED) and governance decision trace are written.

5. Credential retrieval. The requester uses the ephemeral issuance token to call POST /api/requests/:id/issue. The endpoint:

  • Verifies the token via constant-time comparison (crypto.timingSafeEqual).
  • Checks the rate-limit predicate (issuanceCount < maxIssuances, default cap = 3).
  • Creates a new Keeper one-time share URL via Commander.
  • Returns the URL in the response body (transit only — never persisted as plaintext).
  • Writes IssuanceEvent with shareLinkHash and increments issuanceCount.

6. Lease lifecycle. Background jobs manage the lease:

  • lease-scheduler.ts: Runs every 60s. Expires leases past leaseExpiresAt, prompts renewal within the renewal window.
  • issuance-retry.job.ts: Retries failed issuances with exponential backoff.
  • vault-sync.job.ts: Syncs Vault record metadata (revision, owner) and detects rotations or orphans.
  • discovery.job.ts: Discovers new Vault records not yet registered in Postgres.
  • permission-sync.job.ts: Re-resolves Vault folder permissions and invalidates stale caches.
  • commander-rotation-check.job.ts: Flags records where Commander detects a rotation is due.

7. Release / expiry. Requester can release early, or the lease expires automatically. Both paths revoke outstanding issuances via revocation.service.ts (calls vaultShareService.removeOneTimeShare by Vault share ID, not URL).

Integration Points

IntegrationProtocolService Filev1 Status
Keeper — readsKSM SDK (@keeper-security/secrets-manager-core)vault-read.service.tsReal in staging, mock in local
Keeper — sharesCommander CLI subprocess (python3 -m keepercommander)vault-share.service.ts, commander.tsStubbed in staging (StubVaultShareService)
Entra IDJWT validation via jose against JWKSidentity.service.tsReal in staging
Microsoft GraphGroup membership resolutionidentity.service.tsSeam exists, groups cached 5 min
Teams Bot FrameworkAdaptive Cards for notificationsnotification.service.ts, card templatesNot registered — mock only
Azure Key VaultApp Settings references (@Microsoft.KeyVault(...))appsettings-staging.jsonReal — DATABASE_URL, KSM_CONFIG, ENTRA_TENANT_ID
Application InsightsOpenTelemetry via telemetry.service.tstelemetry.service.tsConfigured in staging

Deployment Topology

ResourceSKU / TierResource GroupNotes
App Service PlanB1 Linuxrg-passkey-stgShared with staging web app
Web Appapp-passkey-stg-ben-6b2frg-passkey-stgNode 22, startup: node backend/dist/server.js
Postgres Flexible ServerBurstable B1msrg-passkey-stgpg-passkey-stg-ben-6b2f, default postgres db
Key VaultStandardrg-passkey-stgkv-pk-stg-ben-6b2f, RBAC mode
Prod (provisioned, not deployed)
App Serviceapp-passkey-prod-1353rg-passkey-prodBehind VNet with NAT Gateway
Key Vaultkv-pk-prod-1353rg-passkey-prodPrivate endpoint planned
ACR (bootstrap)crpasskeyprodrg-passkey-prodNot used for ongoing deploys

Deploy method: zip deploy (az webapp deploy --type zip). Scripts: deploy/deploy-staging.ps1, deploy/deploy-prod.ps1. Both follow the same 9-step process: clean build, npm ci, prisma generate, compile (shared -> backend -> frontend), package (tar, forward-slash paths for Linux), set startup command, deploy zip, auto-migrate on boot, smoke test /healthz.

Data Model Summary

The Prisma schema (backend/prisma/schema.prisma) defines 14 models:

Core workflow:

  • User — Entra-linked identity with Role (REQUESTER / APPROVER / ADMIN), allowMultipleLeases flag.
  • Record — Credential record linked to a Keeper vault UID. Tracks syncStatus (UNKNOWN / FRESH / STALE / ORPHANED), vaultRecordUidPinned flag, auto/webhook discovery metadata.
  • Request — The request lifecycle entity. Status is a 9-value enum: PENDING, APPROVED, ISSUED, DENIED, EXPIRED, RELEASED, RENEWAL_PENDING, REQUIRES_TRIAGE, UNFULFILLABLE. Carries issuance token hash, issuance count/cap, lease timestamps. No separate Lease model — lease is implicit via leaseStartedAt / leaseExpiresAt.
  • IssuanceEvent — One row per share-link issuance. Stores shareLinkHash (SHA-256, never the URL), vaultShareId (for revocation by ID), revocation timestamp.
  • AuditEvent — 40-action audit enum covering the full lifecycle.
  • Notification — In-app notification queue (mock mode). Real mode uses Bot Framework.

Governance layer:

  • GovernedResource — Links a Record to visibility/access/lease policies and authority mappings.
  • GovernedResourceAuthority — OWNER/CUSTODIAN/APPROVER mappings with principal types (USER/GROUP/ROLE/EMAIL). Append-only history: revokedAt, supersededById linkage, validFrom/validTo. Immutability enforced by Postgres BEFORE UPDATE trigger (INV-1).
  • GovernanceDecisionTrace — Full decision audit: stage, outcome, reasons, constraints, trace JSON, policy version, inputs hash, evaluating code version.
  • GovernancePolicy — System-managed policy definitions (visibility, access, lease) with JSON rule bodies.

Vault sync:

  • VaultFolder — Keeper shared folder, ACL container.
  • VaultFolderPermission — Per-principal permission on a folder (OWNER/MANAGER/EDITOR/VIEWER).
  • UserPermissionCache — Resolved permissions with 5-minute TTL.

Observability:

  • GovernanceProbeEvent — Audit log for governance probe executions.

Policy Engine

The policy engine (backend/src/policy/) is a pure-function evaluator with no side effects. Architecture:

  • engine.ts — Main evaluate() function. Aggregation: DENY short-circuits, TRIAGE wins over ROUTE, ROUTE wins over AUTO_APPROVE, all-ABSTAIN collapses to AUTO_APPROVE.
  • decision.ts — Decision types: AUTO_APPROVE, DENY, ROUTE_TO_AUTHORITY, REQUIRES_HUMAN_TRIAGE.
  • Rules (in policy/rules/):
    • self-approval-block — Owners cannot approve their own resources.
    • sensitivity-escalation — High-sensitivity records require higher authority.
    • visibility — Vault folder visibility check.
    • duration-caps — Max lease duration (configurable per policy), extension/renewal caps.
    • request-state — Validates transitions on the 9-state RequestStatus machine.
    • authority-routing — Routes to designated approver based on authority mappings.
  • context-builder.ts — Builds the PolicyContext from request + resource + actor.
  • inputs-hash.ts — SHA-256 of canonicalized policy inputs for reproducibility.
  • replay-context.ts — Replay support for decision audit.
  • version.ts — Policy version stamping.
  • trace.ts — Trace enrichment.

v3 — Planned (Scope TBD)

Status: v3 scope has not been locked. The following represents architectural direction, not committed work. Items marked with [needs decision] require PO/CTO input before implementation begins.

Expected Architectural Deltas

  • Commander → full mode: Flip VAULT_DEPLOYMENT_MODE from staging to full, enabling RealVaultShareService (Commander subprocess for live share creation). [needs decision: Keeper trial conversion to paid required first.]
  • Bot Framework registration: Register the Teams bot, wire notification.service.ts to real Adaptive Card delivery. [needs decision: M365 admin approval, bot identity ownership.]
  • Governance gate hardening: Flip HARDEN_GOVERNANCE_v1=true in production. This enables the issuance token-exchange flow as the sole credential delivery path. [needs decision: requires runtime validation in staging first.]
  • Production deployment: Stand up app-passkey-prod-1353 with VNet, NAT Gateway, private endpoints for KV and Postgres. Complete appsettings-prod.json (currently has placeholder Entra values). [needs decision: SKU tier, custom domain, SSL cert.]
  • Frontend test runner: Wire vitest into frontend/package.json scripts, expand test coverage beyond the single governance.spec.ts.
  • Discovery strategy finalization: Migrate from bootstrap title-match to UID-pinned discovery (option B). Reconcile or reset hand-linked SQL between Postgres records and Keeper UIDs. [needs decision: reset vs. reconcile approach.]
  • Dedicated Postgres database: Migrate from default postgres db to dedicated passkey database (flagged as deviation in manifest).
  • Baseline Prisma migration: Create initial baseline migration to replace db push (current approach for fresh databases).

What’s Underspecified

The following v3 items lack sufficient detail to architect against:

  • Scope boundary between v3 and v4 (what’s in, what’s deferred)
  • User volume target for v3 (drives SKU sizing, Keeper licensing tier)
  • Whether v3 includes multi-tenant support or stays single-tenant
  • Frontend feature scope (which UI surfaces need to be production-ready vs. admin-only)
  • SLA targets (uptime, RTO, RPO) for production deployment
  • Monitoring and alerting requirements beyond Application Insights telemetry

v4 — Concept (SMS Android MFA)

Status: Conceptual. All details below are assumptions, not commitments. Marked accordingly.

Architecture Sketch

v4 adds a second factor to credential retrieval: after the requester presents their issuance token, the system sends an SMS OTP to their registered mobile number. The requester enters the OTP to complete the token exchange.

SMS gateway options (decision needed):

OptionProsConsEst. Cost
Azure Communication ServicesNative Azure integration, single billing, no new vendorLess mature SMS API than Twilio, limited international coverage~$0.0075/msg US
TwilioIndustry standard, global coverage, robust APIAdditional vendor relationship, separate billing~$0.0079/msg US
Keeper-native (if available)Stays in-platformUnknown availability, couples MFA to vault vendorTBD

Assumption: Azure Communication Services is the likely choice given existing Azure investment, unless international SMS coverage is a requirement.

Android-side MFA capture path (concept, not designed):

  • Option A: Companion app (custom Android app that receives push notification + OTP entry)
  • Option B: Standard SMS delivery (no app needed, OTP arrives as text message)
  • Option C: Native Android Credential Manager API (device-bound, no SMS — different security model)

Assumption: Option B (standard SMS) for v4 MVP. Option A or C for hardening in a later phase.

Integration into approval flow:

The existing POST /api/requests/:id/issue endpoint would gain a two-step exchange:

  1. Step 1: Present issuance token → receive challenge (SMS sent, challenge ID returned).
  2. Step 2: Present challenge ID + OTP → receive share URL.

This adds a ChallengeEvent model to the schema and a new challenge.service.ts alongside the existing issuance service.

New threat surface:

  • SIM swap attacks (mitigated only by device-bound MFA on top, not by SMS alone)
  • SMS interception (SS7 vulnerabilities)
  • Android device compromise (if companion app is used)
  • OTP replay (mitigated by single-use + short TTL)

See Security Architecture for detailed threat analysis per phase.