I’ve been thinking a lot about distributed rate limiting lately. In today’s world of microservices and cloud-native applications, a simple local rate limiter just doesn’t cut it anymore. When you have multiple instances of your service running, each needs to respect the same global limits. That’s where Redis comes in - it gives us a shared state that all our instances can trust.
Why does this matter? Imagine a user making requests to different instances of your API. Without a distributed approach, each instance might allow the maximum number of requests, effectively multiplying the limit. This defeats the purpose of rate limiting entirely.
Let me show you how we can build something better. We’ll use Redis as our coordination layer, Node.js for the runtime, and TypeScript for type safety. The combination gives us both performance and reliability.
Here’s a basic setup for our token bucket implementation:
const checkRateLimit = async (userId: string): Promise<boolean> => {
const key = `rate_limit:${userId}`;
const now = Date.now();
const windowMs = 60000; // 1 minute
const maxRequests = 100;
const result = await redis.eval(
`local current = redis.call('get', KEYS[1])
if current and tonumber(current) >= tonumber(ARGV[1]) then
return 0
else
redis.call('incr', KEYS[1])
redis.call('expire', KEYS[1], ARGV[2])
return 1
end`,
1, key, maxRequests, Math.ceil(windowMs/1000)
);
return result === 1;
};
But wait - did you notice the potential race condition here? What happens if multiple requests come in at exactly the same time? That’s where Redis transactions and Lua scripting become essential.
The sliding window algorithm offers more precision than fixed windows. Instead of resetting the counter at fixed intervals, it looks at the actual request pattern. Here’s how we might implement it:
const slidingWindowCheck = async (ip: string): Promise<RateLimitResult> => {
const key = `sliding:${ip}`;
const now = Date.now();
const windowMs = 60000;
const maxRequests = 100;
// Remove old timestamps
await redis.zremrangebyscore(key, 0, now - windowMs);
// Get current count
const currentCount = await redis.zcard(key);
if (currentCount >= maxRequests) {
return { allowed: false, remaining: 0 };
}
// Add new request timestamp
await redis.zadd(key, now, `${now}-${Math.random()}`);
await redis.expire(key, Math.ceil(windowMs/1000));
return { allowed: true, remaining: maxRequests - currentCount - 1 };
};
Have you considered what happens when Redis becomes unavailable? We need fallback strategies. One approach is to use a local in-memory rate limiter as a backup, though this sacrifices perfect consistency.
Monitoring is crucial. We should track metrics like rejection rates, Redis latency, and request patterns. This helps us tune our limits and catch issues early.
Here’s how we might wrap this in Express middleware:
const rateLimitMiddleware = (config: RateLimitConfig) => {
return async (req: Request, res: Response, next: NextFunction) => {
const key = `${config.prefix}:${req.ip}`;
try {
const result = await rateLimiter.check(key, config);
res.set({
'X-RateLimit-Limit': config.maxRequests.toString(),
'X-RateLimit-Remaining': result.remaining.toString(),
'X-RateLimit-Reset': Math.ceil(result.resetTime/1000).toString()
});
if (!result.allowed) {
return res.status(429).json({ error: 'Too many requests' });
}
next();
} catch (error) {
// Fallback to local rate limiting or allow all
next();
}
};
};
What about different rate limiting strategies? Sometimes you want to limit per user, sometimes per IP, and sometimes per API key. The beauty of this approach is that we can easily adjust the key generation logic.
Testing distributed rate limiting requires careful planning. We need to simulate multiple concurrent requests and different failure scenarios. Docker Compose makes it easy to spin up a test environment with Redis and our application.
Performance optimization is always important. We can use Redis pipelining to reduce round trips, and consider using Redis clusters for very high throughput scenarios.
Remember that rate limiting isn’t just about preventing abuse. It’s also about ensuring fair usage and protecting your system from being overwhelmed. The right limits depend on your specific use case and traffic patterns.
I’d love to hear your thoughts on this approach. Have you implemented distributed rate limiting in your projects? What challenges did you face? Share your experiences in the comments below, and don’t forget to like and share if you found this useful!