I’ve been thinking about rate limiting a lot lately. Why? Because last month, one of our production APIs got hammered by a sudden traffic surge that nearly took down our entire service. That experience made me realize how crucial proper rate limiting is for any serious application. It’s not just about preventing abuse - it’s about creating fair access for all users while maintaining system stability. Today, I’ll share how we built a high-performance rate limiter using Redis and Node.js that handles over 50,000 requests per second with sub-millisecond latency.
Rate limiting acts as your first line of defense against traffic spikes and malicious attacks. Without it, a single aggressive client could monopolize your resources. But how do you choose the right approach? We’ll implement three proven algorithms that serve different needs. Fixed window is simple but has edge cases. Sliding window gives more accurate counts. Token bucket allows for burst handling. Each has tradeoffs worth understanding.
Let’s start with the project setup:
mkdir rate-limiter && cd rate-limiter
npm init -y
npm install express redis ioredis
npm install -D typescript @types/node @types/express
Our core interface defines what any rate limiter must implement:
// types/rate-limiter.types.ts
export interface RateLimiterStorage {
increment(key: string): Promise<{
allowed: boolean;
remaining: number;
resetTime: Date;
}>;
}
Now the Redis implementation using Lua scripts for atomic operations:
// storage/redis-sliding-window.ts
import Redis from 'ioredis';
export class RedisSlidingWindow implements RateLimiterStorage {
private redis: Redis;
constructor() {
this.redis = new Redis();
this.redis.defineCommand('slidingWindowIncrement', {
numberOfKeys: 1,
lua: `
local key = KEYS[1]
local window = tonumber(ARGV[1])
local max = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local clearBefore = now - window
redis.call('ZREMRANGEBYSCORE', key, 0, clearBefore)
local current = redis.call('ZCARD', key)
if current < max then
redis.call('ZADD', key, now, now .. math.random())
redis.call('EXPIRE', key, window/1000)
return {1, current+1, max - (current+1)}
end
return {0, current, 0}
`
});
}
async increment(key: string, windowMs: number, max: number) {
const [allowed, total] = await (this.redis as any)
.slidingWindowIncrement(key, windowMs, max, Date.now());
return {
allowed: !!allowed,
remaining: max - total,
resetTime: new Date(Date.now() + windowMs)
};
}
}
Notice how we use Redis sorted sets for precision? This maintains request timestamps within our window, removing older entries efficiently. For token bucket implementation, we track tokens and last refill time:
// storage/redis-token-bucket.ts
export class RedisTokenBucket implements RateLimiterStorage {
// ...constructor similar to above...
async increment(key: string, capacity: number, refillRate: number) {
const now = Date.now();
const result = await (this.redis as any).tokenBucketIncrement(
key, capacity, refillRate, now
);
const [allowed, remaining] = result;
return {
allowed: !!allowed,
remaining,
resetTime: new Date(now + 1000/refillRate)
};
}
}
Integrating this with Express middleware is straightforward:
// middleware/rateLimiter.ts
import { Request, Response, NextFunction } from 'express';
export function rateLimiter(storage: RateLimiterStorage, keyFn: (req: Request) => string) {
return async (req: Request, res: Response, next: NextFunction) => {
const key = keyFn(req);
const result = await storage.increment(key);
res.set('X-RateLimit-Limit', result.limit.toString());
res.set('X-RateLimit-Remaining', result.remaining.toString());
res.set('X-RateLimit-Reset', result.resetTime.getTime().toString());
if (!result.allowed) {
return res.status(429).send('Too many requests');
}
next();
};
}
What happens when your application scales across multiple servers? Redis becomes our single source of truth. We use the same storage implementation across all instances. For heavy loads, we pipeline commands to reduce round trips. And we always set appropriate TTLs to prevent memory bloat.
For monitoring, we track:
- Rejection rates per endpoint
- Redis memory usage
- Latency percentiles
When Redis becomes unavailable, we fail open to avoid denying legitimate traffic. We log these incidents and fall back to in-memory limiting if necessary.
Here’s how we initialize everything:
// server.ts
import express from 'express';
import { RedisSlidingWindow } from './storage/redis-sliding-window';
import { rateLimiter } from './middleware/rateLimiter';
const app = express();
const limiter = new RedisSlidingWindow();
app.use(rateLimiter(limiter, req => req.ip));
app.get('/api', (req, res) => {
res.send('Hello world!');
});
app.listen(3000);
Does this handle all scenarios? For most applications - yes. But consider edge cases like distributed denial-of-service attacks. We might need additional layers like cloud-based WAFs. For stateful APIs, we might key limiters by user ID instead of IP.
The system we’ve built provides:
- Microsecond response times
- Accurate request counting
- Horizontal scalability
- Multiple algorithm support
- Detailed rate limit headers
Remember to test under load! We use artillery.io to simulate traffic patterns. Start with conservative limits and adjust based on real usage.
What challenges have you faced with rate limiting? I’d love to hear about your experiences in the comments. If this guide helped you, please share it with others who might benefit. Together, we can build more resilient web services.