ADR 0001: Build the Queue Service as the standalone waiting-room repo¶
- Status: Accepted
- Date: 2026-05-19
- Deciders: HNS Ticketing engineering
- Refines: original E5 implementation plan in microservices-strategy.md and epic-waiting-queue-system.md
Context¶
The original E5 plan called for extracting the queue functionality out of the Symfony monolith into a dedicated service: "Node.js or Go" runtime, Redis sorted-set backed, WebSocket/SSE realtime, 100k+ concurrent connections, 20-min purchase window, 30-min position persistence. Cross-service signalling to the monolith was sketched as Redis Pub/Sub (queue.turn_granted, queue.expired) on the shared Redis cluster, with queue state co-located in the same Redis instance that holds cart reservations, sessions, and the notification job queue.
We built that service — it lives in the sibling repo ../waiting-room/ (currently v0.15.5). During implementation we landed on several design choices that differ from the original sketch and are worth documenting:
- Built as a multi-tenant service from day one (single instance can later gate other surfaces such as the Drupal webshop), not the single-purpose service the original plan implied.
- Capacity mode (auto-admit up to a concurrency cap, hold a session for
session_ttl_seconds) is the integration shape, alongside anoperatormode (manual call-next-N) that we do not use. - Cross-service signalling to the backend is pull-based via
GET /accessrather than push-based via Redis Pub/Sub. - Its own Postgres + Valkey stack — not co-located on the platform's shared Redis cluster.
- Valkey (BSD-3, Linux Foundation fork), not Redis-the-product (re-licensed to SSPL/RSALv2 in 2024). Redis-protocol clients (ioredis, BullMQ) work unchanged.
- Stack: Node 20 + TypeScript 5.7 + Fastify 5 + Prisma 6 + BullMQ; FCM (
firebase-admin) + APNs (@parse/node-apn) dispatchers; cross-instance WebSocket fan-out via Valkey Pub/Sub. - Released under an OSI-compatible licence (per a hard product constraint in the repo's own AGENTS.md).
Decision¶
The Queue Service for HNS Ticketing is the waiting-room repo. HNS Ticketing runs it as one of its sibling stacks and integrates as the first tenant.
Consequences for the platform architecture:
- Standalone deployment.
waiting-roomis an additional sibling repo with its own Postgres + Valkey + BullMQ. It does not share the ticketing backend's Redis or Postgres. - Valkey, not Redis-the-product. A deliberate choice driven by the 2024 SSPL/RSALv2 re-license. The Redis-protocol clients we already use (Predis, ioredis, BullMQ) continue to work; the on-disk product is Valkey.
- Capacity mode is the integration model.
waiting-roomsupports bothoperator(call-next-N, helpdesk style) andcapacity(auto-admit up to a concurrency cap, then hold a session forsession_ttl_seconds). We use capacity mode to gate protected endpoints (checkout / seat selection) on the ticketing backend. GET /accessreplaces the plannedqueue.turn_grantedpub/sub. The ticketing backend (the protected origin) validates each protected request by callingGET /accessonwaiting-roomwith the mobile client's session token. This is pull-based, one Valkey lookup per validation, designed to be cached for 1–5s on the origin side. We do not subscribe toqueue.*Redis pub/sub channels and we do not putqueue.*subjects on our NATS Event Bus.- Two new token types in the auth model. Mobile clients receive a
ticketToken(talks towaiting-room: ticket state, WebSocket upgrade, voluntary leave) and asessionToken(talks to our origin endpoints; validated via/access). Both are orthogonal to the Keycloak-issued user JWT. - Queue notifications fan out from
waiting-room, not from our Notification Workers.waiting-roomowns delivery ofadmitted,position_changed,session_expired,expired,cancelledover its own WebSocket gateway and FCM/APNs dispatchers. Our Notification Workers continue to own order/ticket/quota/loyalty notifications. queue_entriesandqueue:*Redis structures leave the ticketing backend schema. Durable queue history lives inwaiting-room's Postgres; hot queue state lives inwaiting-room's Valkey. The ticketing backend no longer models a Queue entity.
Consequences¶
Positive¶
- E5-F1 through E5-F5 are owned end-to-end by
waiting-room, including parts (E5-F2 real-time updates, E5-F3 FCM/APNs push) that the original plan left as separate workstreams. - The Symfony monolith does not take on long-lived WebSocket connections — PHP-FPM is unchanged.
- Multi-tenant by design: the same
waiting-roomdeployment can later gate the Drupal webshop, admin portal, or other surfaces without spinning up additional services. - Capacity sizing is documented in
waiting-room/docs/bottlenecks_estimate.md: a single 8 vCPU / 32 GB host sustains ~100–150k waiting users.
Negative / risks¶
- Same-user-same-position dedup is not built-in.
waiting-roomkeys tickets by ticket id, not user id, because it is tenant-agnostic about how callers identify users. The original spec called for "multi-device sync — same user = same position." We close the gap by enforcing1 ticket per (user_id, queue_id)on the call site (mobile or backend) before issuingPOST /queues/{id}/tickets. - No backend-side event hook today. When a fan is admitted, the ticketing backend learns about it lazily — either via the mobile app forwarding the
admittedevent, or via the subsequent/accessvalidation call. A futureWebhookDispatcher(theNotificationDispatcherinterface inwaiting-roomis explicitly designed to support this) would close the gap; not required for v1. - Two services to operate instead of one (
waiting-room+ ticketing backend) — counter-balanced bywaiting-roombeing a single Docker image with its own self-contained compose stack. /accessis on the hot path for every protected request. Origin-side caching (1–5s, capped atsessionExpiresAt) is mandatory; fail-closed is the documented default.
Neutral¶
- Mobile app integration follows
waiting-room/docs/workflow.md. The shape (POST join → WS for live updates → call origin with session token) is consistent with what the original architecture spec implied. - Admin operations (create/edit/list queues) use
waiting-room's own admin UI or REST API. The ticketing backend provisions one queue per match at publish time.
Implementation outline¶
Not an implementation plan; just enough to make the consequences concrete.
waiting-roomis added to the sibling-repos layout with its own compose stack (own Postgres, own Valkey, own Traefik route).- Bootstrap a tenant for HNS Ticketing via
npm run admin -- tenant:create; store the resulting tenant API key in the ticketing backend's secret store. - On
Match.publish, the backend creates a capacity-mode queue inwaiting-room(one queue per match) and storeswaiting_room_queue_idon the Match. - The mobile app joins via
POST /queues/{waiting_room_queue_id}/ticketswith the tenant API key, opens the WebSocket, and receivesadmittedwith asessionToken. - The backend's checkout / seat-selection endpoints become "protected origin": each request goes through a
RequireSessionmiddleware that callsGET /access(cached 1–5s), and rejects on 401/410. - Drop the
QueueController,QueueService,QueueEntryentity, andqueue:*Redis usage from the ticketing backend. Migration removes thequeue_entriestable. - Remove
queue.*subjects from the NATS Event Bus catalog; keeporder.*,ticket.*,payment.*, etc.
Alternatives considered¶
- Keep the queue logic in the Symfony monolith. Rejected: PHP-FPM is unsuitable for 100k+ long-lived WebSocket connections; this was the original reason the architecture called for extraction.
- Build the Queue Service single-tenant and tightly coupled to HNS Ticketing, with Redis Pub/Sub events back to the monolith (the literal reading of the original microservices-strategy spec). Rejected: makes the service hard to reuse for other surfaces (Drupal webshop is the obvious next consumer), and ties cross-service signalling to a transport (shared Redis Pub/Sub) that doesn't survive separating Valkey instances. The chosen pull-based
/accessdesign is cheaper, cacheable, and works across multiple consuming origins. - Push-based events (Redis Pub/Sub or NATS) instead of pull-based
/access. Rejected: pull-based is simpler for the consumer (no subscriber wiring, no replay logic), naturally cacheable on the origin side, and aligns with the model the protected origin needs anyway ("is this token still valid right now?"). Push-based would require origins to handle out-of-order delivery and at-least-once retries.
References¶
../waiting-room/ARCHITECTURE.md— server-side data model, ticket lifecycle, dispatcher pattern.../waiting-room/docs/workflow.md— mobile + origin integration guide.../waiting-room/docs/bottlenecks_estimate.md— hardware sizing.microservices-strategy.md— Queue Service section (rewritten to reflect this decision).architecture-overview.md— repository layout and event catalog (updated).- E5 epic and feature docs (
e5-f*.md) — bannered with a pointer to this ADR.