The Complete Guide to Asynchronous Request-Reply Patterns
Choosing the Right Approach for Your System
Your API just returned a 504 Gateway Timeout because generating that report took 45 seconds.
Your users are frustrated. Your connection pool is exhausted. Your system is brittle.
The Asynchronous Request-Reply (ARR) pattern solves this: acknowledge requests immediately, process in the background, notify when complete.
Here are your five implementation options and when to use each.
1. Polling: Start Here
How it works: The server returns a 202 Accepted response code along with a status URL. Client checks periodically until complete (303 redirect to result).
Use the Retry-After header. Let the server control polling frequency—no guessing needed.
Best for: Browser clients, corporate firewalls, tasks under 60 seconds.
Avoid when: Tasks take hours, real-time updates are required, or you have thousands of concurrent pollers.
2. Webhooks: Push When Ready
How it works: The client provides a callback URL. Server processes in the background and posts the result to the callback when done.
Security is mandatory. Verify requests using HMAC signatures or JWT tokens. Never trust incoming webhook data unquestioningly.
Implement retry logic with exponential backoff and dead-letter queues. Make your handlers idempotent—you’ll deliver webhooks multiple times.
Best for: Server-to-server communication, event-driven architectures.
Avoid when: Browser clients, behind firewalls, or when debugging complexity is a concern.
3. Server-Sent Events: The Underrated Option
How it works: The client opens a persistent HTTP connection. Server pushes events through this stream when tasks complete.
Automatic reconnection is built in. Browsers handle reconnection and resumption using Last-Event-ID header.
Text-only format. JSON works great. Binary data needs base64 or a separate HTTP fetch.
Best for: Real-time browser updates and one-way server-to-client communication.
Avoid when: bidirectional communication, binary streaming, or support for IE/legacy browsers are required.
Example: OpenAI’s ChatGPT streaming responses.
4. WebSockets: For True Bidirectionality
How it works: Persistent, full-duplex connection. Both the client and the server can send messages at any time.
Operational complexity is objective. Heartbeat/ping required. Sticky sessions for load balancing. Stateful connection management.
Best for: Chat, collaborative editing, gaming—anything requiring frequent bidirectional updates and sub-100ms latency.
Avoid when: One-way updates (use SSE), infrequent communication (use polling), or simple request-reply.
5. Message Brokers: The Enterprise Backbone
How it works: Client publishes to the broker with correlation_id + reply_to Address. The server consumes, processes, and publishes a reply with the same correlation_id. Client matches responses using the correlation ID.
Idempotency is mandatory. At least once, delivery means duplicate messages. Your handlers must handle this safely.
Monitor dead-letter queues. Failed messages after max retries go to DLQs—they’re your canary for system issues.
Broker choice:
RabbitMQ: Low latency, complex routing (< 50K msgs/sec)
Kafka: High throughput, event streaming (millions msgs/sec)
AWS SQS/SNS: Managed, serverless, pay-per-use
Best for: Microservices, guaranteed delivery, high throughput, complex routing.
Avoid when: Browser clients, simple APIs, sub-10ms latency needs, and no DevOps expertise.
Quick Selection Guide
Polling: Browser clients, most straightforward implementation, tasks < 60 seconds, moderate scale
Webhooks: Server-to-server, event-driven, large-scale, need push notifications
SSE: Browser real-time updates, one-way communication, simpler than WebSockets
WebSockets: Bidirectional, < 100ms latency, chat/collaboration, very large scale
Message Brokers: Microservices, millions of messages/second, guaranteed delivery, complex routing
Start Simple, Scale Smart
Begin with polling. It’s universally compatible, easy to debug, and solves 80% of async cases.
Add complexity (WebSockets, message brokers) only when requirements demand it.
The best architecture solves your actual problems without introducing unnecessary complexity.






