I’ve been thinking about distributed rate limiting lately. Why? Because last month, our API started getting hammered by sudden traffic spikes. Our single-instance rate limiter couldn’t handle the load when we scaled horizontally. That’s when I realized we needed a distributed solution. Today, I’ll walk you through building one with Redis and Node.js that scales with your application. Let’s get started.
First, why Redis? It’s fast. It handles atomic operations beautifully. And its expiration features make it ideal for tracking request windows. We’ll use ioredis - a robust Redis client for Node.js. Here’s how we set up our environment:
npm init -y
npm install express redis ioredis
Our Redis client needs proper configuration. Notice how we handle failovers and errors:
// redis-client.ts
import Redis from 'ioredis';
export class RedisClient {
private static instance: Redis;
public static getInstance(): Redis {
if (!RedisClient.instance) {
RedisClient.instance = new Redis({
host: process.env.REDIS_HOST || 'localhost',
port: parseInt(process.env.REDIS_PORT || '6379'),
retryDelayOnFailover: 100,
maxRetriesPerRequest: 3
});
RedisClient.instance.on('error', (err) => {
console.error('Redis connection error:', err);
});
}
return RedisClient.instance;
}
}
Now, let’s tackle algorithms. The Token Bucket method allows bursts while maintaining average rates. How does it work? Imagine a bucket filling with tokens at a steady rate. Each request takes a token. If the bucket’s empty, you wait. Here’s the implementation:
// token-bucket.ts
async checkLimit(key: string): Promise<RateLimitResult> {
const bucketKey = `token_bucket:${key}`;
const now = Date.now() / 1000;
const luaScript = `
-- [Lua script logic]
local tokens_to_add = time_elapsed * refill_rate
current_tokens = math.min(capacity, current_tokens + tokens_to_add)
if current_tokens >= requested_tokens then
current_tokens = current_tokens - requested_tokens
allowed = true
end
-- [Remaining implementation]
`;
const result = await this.redis.eval(luaScript, 1, bucketKey, ...params);
return { allowed: result[0] === 1, remaining: result[1] };
}
But what if you need precise request counting? Enter the Sliding Window algorithm. It tracks exact timestamps within a moving timeframe. This prevents the “burst at window edge” problem of fixed windows. See the difference?
// sliding-window.ts
async checkLimit(key: string): Promise<RateLimitResult> {
const windowKey = `sliding_window:${key}`;
const now = Date.now();
const windowStart = now - (this.config.windowSize * 1000);
const luaScript = `
redis.call('ZREMRANGEBYSCORE', window_key, 0, window_start)
local request_count = redis.call('ZCARD', window_key)
if request_count < tonumber(ARGV[3]) then
redis.call('ZADD', window_key, now, now)
redis.call('EXPIRE', window_key, ${this.config.windowSize * 2})
end
return {request_count < tonumber(ARGV[3]) and 1 or 0, ...}
`;
const result = await this.redis.eval(luaScript, 1, windowKey, ...params);
}
Notice how we use Redis’ SORTED SETS here? That’s key. We add timestamps as scores, then remove expired ones before checking count.
Now, how do we integrate this with Express? Middleware. Clean, reusable middleware:
// rate-limit.ts
import { Request, Response, NextFunction } from 'express';
export function rateLimit(limiter: RateLimiter, config: RateLimitConfig) {
return async (req: Request, res: Response, next: NextFunction) => {
const key = config.keyGenerator ? config.keyGenerator(req) : req.ip;
const result = await limiter.checkLimit(key);
if (!result.allowed) {
res.setHeader('X-RateLimit-Limit', config.maxRequests.toString());
res.setHeader('X-RateLimit-Remaining', result.remaining.toString());
res.setHeader('X-Retry-After', Math.ceil((result.resetTime - Date.now())/1000).toString());
return res.status(429).send('Too many requests');
}
res.setHeader('X-RateLimit-Limit', config.maxRequests.toString());
res.setHeader('X-RateLimit-Remaining', result.remaining.toString());
next();
};
}
What about production? Three critical considerations:
- Redis persistence: Use AOF with everysec policy
- Cluster mode: Shard keys across instances
- Circuit breaking: Add fallback logic when Redis fails
Monitoring is crucial. Track these Redis metrics:
- Memory usage
- Evicted keys count
- Command latency
- Connection errors
I once made a mistake with TTLs. Set them too short, and you’ll block legitimate users. Too long? Memory bloat. Our solution: windowSize * 2
works for most cases.
Did you know improper key design can cause hot partitions? We prefix keys with rate_limit:{user_id}
instead of just user_id
to distribute load.
For testing, use artillery.io. Simulate traffic spikes across regions. Here’s a sample test config:
config:
target: "http://localhost:3000"
phases:
- duration: 60
arrivalRate: 100
scenarios:
- flow:
- get:
url: "/api/resource"
Finally, remember this: Rate limiting isn’t just about blocking abuse. It’s about fair resource allocation. When you implement it properly, everyone gets consistent performance.
What challenges have you faced with rate limiting? Share your experiences below! If this guide helped you, please like and share it with other developers facing similar scaling challenges. Comments? I’d love to hear your implementation stories.