Scaling to 100k Concurrent WebSocket Connections with Node.js and Redis Streams

When I was tasked with building a real-time notification engine for a platform expecting 100,000 concurrent users, my first reaction was excitement. My second was: a single Node.js process handles ~65,000 connections before TCP file descriptor limits kick in. This was going to require horizontal scaling from day one.

Here’s how I designed the system, handled the scaling challenges, and load-tested it to validate the result.

The Architecture

The notification engine needed to:

Maintain persistent WebSocket connections per authenticated user
Fan out notifications from backend services to all connected devices for a given user
Handle user sessions across multiple server instances
Survive individual server crashes without losing notification delivery

My architecture had four main components:

WebSocket Gateway Pods — Node.js processes accepting WebSocket connections
Redis Streams — Pub/sub backbone between pods and producers
Presence Service — Tracks which pod holds a user’s connection
Notification Producer — Backend services that push events to Redis

[Backend Services] → [Redis Streams] → [WS Gateway Pods] → [User Devices]
                           ↑
                    [Presence Service]
                    (Redis Hash)

WebSocket Gateway: The Core Loop

Each gateway pod runs a tight WebSocket server using uWebSockets.js instead of the more common ws library. The performance difference is significant: uWS can handle ~2x the connections at half the memory footprint because it’s a native C++ implementation.

// gateway/server.ts
import uWS from 'uWebSockets.js';
import { Redis } from 'ioredis';
import { verifyJWT } from './auth';
import { presenceService } from './presence';
import { startStreamConsumer } from './consumer';

interface UserSocket {
  userId: string;
  deviceId: string;
}

const app = uWS.App();
const redis = new Redis(process.env.REDIS_URL!);

// Map userId -> Set of socket objects
const connections = new Map<string, Set<uWS.WebSocket<UserSocket>>>();

app.ws<UserSocket>('/ws', {
  // Native compression for text payloads
  compression: uWS.SHARED_COMPRESSOR,
  maxPayloadLength: 16 * 1024,
  idleTimeout: 60, // seconds

  upgrade: async (res, req, context) => {
    const token = req.getQuery('token');
    const user = await verifyJWT(token);

    if (!user) {
      res.writeStatus('401').end('Unauthorized');
      return;
    }

    // Pass user data to the WebSocket object
    res.upgrade(
      { userId: user.id, deviceId: user.deviceId },
      req.getHeader('sec-websocket-key'),
      req.getHeader('sec-websocket-protocol'),
      req.getHeader('sec-websocket-extensions'),
      context
    );
  },

  open: async (ws) => {
    const { userId } = ws.getUserData();

    // Register in local connection map
    if (!connections.has(userId)) {
      connections.set(userId, new Set());
    }
    connections.get(userId)!.add(ws);

    // Register presence in Redis
    const podId = process.env.POD_ID!;
    await presenceService.register(userId, podId);

    ws.subscribe(`user:${userId}`); // uWS pub/sub topic
  },

  close: async (ws) => {
    const { userId } = ws.getUserData();
    const userSockets = connections.get(userId);

    if (userSockets) {
      userSockets.delete(ws);
      if (userSockets.size === 0) {
        connections.delete(userId);
        await presenceService.unregister(userId);
      }
    }
  },
});

app.listen(3001, () => {
  startStreamConsumer(app, connections);
});

Redis Streams: The Message Backbone

Redis Streams (XADD / XREADGROUP) are a perfect fit here because they provide:

Persistent message storage (unlike pub/sub which is fire-and-forget)
Consumer groups with acknowledgment — each pod processes messages independently
Backpressure handling when a pod is overloaded

Each pod reads from a consumer group, filters for users it has connections for, and delivers:

// gateway/consumer.ts
import { Redis } from 'ioredis';
import uWS from 'uWebSockets.js';

const redis = new Redis(process.env.REDIS_URL!);
const STREAM_KEY = 'notifications:stream';
const GROUP_NAME = 'gateway-consumers';
const POD_ID = process.env.POD_ID!;

export async function startStreamConsumer(
  app: uWS.TemplatedApp,
  connections: Map<string, Set<uWS.WebSocket<any>>>
) {
  // Ensure consumer group exists
  try {
    await redis.xgroup('CREATE', STREAM_KEY, GROUP_NAME, '$', 'MKSTREAM');
  } catch (e) {
    // Group already exists — fine
  }

  async function consume() {
    const results = await redis.xreadgroup(
      'GROUP', GROUP_NAME, POD_ID,
      'COUNT', '100',
      'BLOCK', '100', // block for 100ms max
      'STREAMS', STREAM_KEY, '>'
    );

    if (!results) {
      setImmediate(consume);
      return;
    }

    const messageIds: string[] = [];

    for (const [, messages] of results) {
      for (const [id, fields] of messages) {
        const data = Object.fromEntries(
          fields.reduce<[string, string][]>((acc, val, idx) => {
            if (idx % 2 === 0) acc.push([val, fields[idx + 1]]);
            return acc;
          }, [])
        );

        const { userId, payload } = data;
        const userConnections = connections.get(userId);

        if (userConnections) {
          // Publish to all connected sockets for this user
          app.publish(`user:${userId}`, payload, false, true);
        }

        messageIds.push(id);
      }
    }

    // Acknowledge processed messages
    if (messageIds.length > 0) {
      await redis.xack(STREAM_KEY, GROUP_NAME, ...messageIds);
    }

    setImmediate(consume); // Non-blocking loop
  }

  consume();
}

Connection Pooling and Memory Management

At scale, memory management becomes critical. Each WebSocket connection in uWS costs ~1.3 KB for the socket itself. At 50k connections per pod, that’s 65 MB just for connection overhead — manageable, but you need to be careful about what you attach to each socket.

I kept the per-connection data minimal (userId + deviceId). User preferences, notification history, and other data stays in Redis, fetched only when needed.

For graceful pod shutdown (during deploys), I implemented a draining strategy:

process.on('SIGTERM', async () => {
  // Stop accepting new connections
  server.close();

  // Notify connected clients to reconnect to another pod
  for (const [userId, sockets] of connections) {
    for (const ws of sockets) {
      ws.send(JSON.stringify({ type: 'RECONNECT', delay: 3000 }));
      ws.end(1001, 'Server shutting down');
    }
  }

  // Wait for connections to drain
  await new Promise(resolve => setTimeout(resolve, 5000));
  process.exit(0);
});

Clients receive the RECONNECT message, wait 3 seconds, then reconnect — by which time the load balancer has stopped sending traffic to the shutting-down pod.

Load Testing Results

I used artillery with a custom WebSocket plugin for load testing:

Concurrent Connections	Pod Count	Avg Latency	Memory/Pod	CPU/Pod
10,000	1	8ms	340 MB	12%
50,000	1	14ms	890 MB	31%
100,000	2	11ms	870 MB each	28% each
200,000	4	12ms	880 MB each	30% each

The system scaled linearly. Adding a pod doubled capacity with minimal overhead from the Redis Stream coordination.

What I’d Change

Consider using Redis Cluster for > 1M connections. A single Redis instance becomes the bottleneck at extreme scale. Redis Cluster with stream sharding by user ID shard handles this.

Add a circuit breaker on the consumer loop. If Redis is slow or unavailable, the consume loop hammers it with retries. A proper circuit breaker with exponential backoff is essential in production.

The architecture held up well in production. Real-time notifications are now delivered in under 50ms end-to-end for 95% of events — a significant improvement over the polling approach it replaced.

Scaling to 100k Concurrent WebSocket Connections with Node.js and Redis Streams

Scaling to 100k Concurrent WebSocket Connections with Node.js and Redis Streams

The Architecture

WebSocket Gateway: The Core Loop

Redis Streams: The Message Backbone

Connection Pooling and Memory Management

Load Testing Results

What I’d Change

Related Articles

CPU-Bound Tasks in Node.js: Worker Threads and Redis Streams in Practice

PostgreSQL Performance Tuning: Indexes, EXPLAIN ANALYZE, and Query Optimization

System Design: Building a Real-Time Notification System at Scale