E5: Waiting Queue System¶
Implemented by the external waiting-room service
E5 is delivered by the standalone waiting-room service (sibling repo ../waiting-room/), built by the HNS Ticketing team as a multi-tenant queue so the same service can later gate other surfaces (e.g. the Drupal webshop). The features below are kept as the consumer contract — what the platform requires from the queue layer. Implementation details (Node 20 + Fastify + Valkey 8, capacity-mode auto-admit, FCM/APNs dispatchers, BullMQ TTL workers, GET /access hot path) live in that repo. The ticketing backend integrates by provisioning one capacity-mode queue per match at publish time and validating session tokens via GET /access on every protected request.
Overview¶
Bounded Context / Service: waiting-room (external), Mobile App
Goal: Manage high-demand match ticket sales with fair queuing for 100k+ concurrent users.
Priority: High
Primary User Roles¶
- Fan
Scope¶
In-Scope¶
- Queue join and position assignment
- Real-time position updates (client countdown + server sync)
- Progressive push notifications (position change, near position, your turn)
- 20-minute purchase window enforcement
- Connection resilience (30-minute position persistence)
- Multi-device synchronization (same user = same position)
- Continue shopping while in queue
- Queue closure when sold out
Out-of-Scope¶
- Priority queue for VIPs (use quota instead)
- Queue bypass mechanisms
Features¶
| ID | Feature | Size | Description |
|---|---|---|---|
| E5-F1 | Queue Join and Position Assignment | M | Join queue and get position |
| E5-F2 | Real-Time Position Updates | M | WebSocket/Firebase updates |
| E5-F3 | Queue Progressive Notifications | S | Push at key milestones |
| E5-F4 | Purchase Window Enforcement | S | 20-minute window at front |
| E5-F5 | Queue Connection Resilience | S | 30-minute position persistence |
Dependencies¶
- Redis Cluster for queue data structures
- Firebase Cloud Messaging for push notifications
- WebSocket or Firebase for real-time updates
Technical Architecture¶
Queue Service Extraction
The Queue System is extracted as a standalone auto-scaling microservice separate from the main Symfony monolith. It activates automatically based on traffic demand (e.g., traffic spikes during high-demand sales) — there is no per-match toggle. Implementation details TBD.
Why Extract as Separate Service¶
| Requirement | Challenge in PHP/Symfony | Solution |
|---|---|---|
| 100k+ concurrent connections | PHP not optimized for long-lived WebSocket connections | Node.js or Go runtime |
| Real-time position updates | Blocking I/O impacts other modules | Dedicated service with event loop |
| Independent scaling | Queue traffic 10x higher than checkout | Horizontal pod scaling |
| Blast-radius isolation | Queue surges risk degrading checkout | Separate deployment |
Queue Service Responsibilities¶
- Queue join and FIFO position assignment (Redis sorted set)
- Real-time position updates via WebSocket/SSE
- 20-minute purchase window enforcement
- 30-minute position persistence on disconnect
- Sold-out detection and queue closure
Communication with the Ticketing Backend¶
Mobile App ──POST /queues/{id}/tickets──► waiting-room
──WSS /tickets/{id}/ws────► waiting-room
│
│ Valkey + Postgres + BullMQ
▼
(all owned by waiting-room)
Mobile App ──Authorization: Bearer <sessionToken>──► Ticketing Backend
│
│ GET /access (cached 1–5s)
▼
waiting-room
200 / 410 / 401
| Direction | Protocol | Purpose |
|---|---|---|
App → waiting-room |
REST (tenant API key) | Join queue, leave |
App ↔ waiting-room |
WebSocket (per-ticket JWT) | Live position_changed, admitted, expired, session_expired |
waiting-room → App (backgrounded) |
FCM / APNs | admitted, expired, session_expired |
Ticketing Backend → waiting-room |
REST GET /access (bearer = session token) |
Per-request validation on protected endpoints |
Ticketing Backend → waiting-room |
REST (operator JWT) | POST /queues at match publish, PATCH /queues/{id} to retune concurrency/TTLs |
There is no Redis Pub/Sub channel between waiting-room and the backend and no queue.* subject on the NATS Event Bus. See Microservices Strategy and ADR 0001 for detailed contracts.
Risks & Open Questions¶
OQ-E5-1: Queue Throughput
What is the expected queue throughput rate (users processed per minute)?
OQ-E5-2: Bot Detection
What bot detection and prevention mechanisms are required?
Risk: Redis Sizing
Redis cluster sizing for 100k+ concurrent users requires careful planning and load testing.
Related Documentation¶
- Waiting Queue Flow
- ADR 0001: Extract waiting-queue to
waiting-room ../waiting-room/ARCHITECTURE.md— service internals../waiting-room/docs/workflow.md— mobile + origin integration guide
Last Updated: January 2026