I’ve been thinking about API security a lot lately. Last month, one of our services got overwhelmed by a sudden traffic spike from a misconfigured client. The experience taught me that robust rate limiting isn’t just nice to have—it’s essential infrastructure. Today, I’ll show you how I built a production-ready rate limiting system using Redis and Express.js that handles millions of requests while staying flexible.
Why Redis? It gives us atomic counters with sub-millisecond responses, automatic key expiration, and distributed state management. Perfect for counting requests across server instances. Let’s set up our environment:
npm init -y
npm install express redis ioredis
Here’s our core Redis connection handler with failover protection:
// redis-client.ts
import Redis from 'ioredis';
class RedisManager {
private redis: Redis;
private isConnected = false;
constructor() {
this.redis = new Redis({
host: process.env.REDIS_HOST || 'localhost',
retryDelayOnFailover: 100,
maxRetriesPerRequest: 3
});
this.redis.on('connect', () => {
this.isConnected = true;
console.log('Redis connected');
});
this.redis.on('error', (err) => {
this.isConnected = false;
console.error('Redis error:', err);
});
}
public getClient() {
return this.redis;
}
}
export const redisManager = new RedisManager();
Now, why choose one algorithm when you can support several? Each approach has tradeoffs. The fixed window method is simple but allows bursts at window edges. The sliding window solves this but costs more memory. Token buckets offer smooth pacing, while sliding logs provide perfect accuracy at higher resource costs. Here’s how we implement the sliding window counter:
// sliding-window.ts
import { redisManager } from './redis-client';
export async function slidingWindowCheck(
key: string,
windowMs: number,
maxRequests: number
) {
const now = Date.now();
const pipeline = redisManager.getClient().pipeline();
// Add current timestamp to sorted set
pipeline.zadd(key, now, now.toString());
// Remove outdated requests
pipeline.zremrangebyscore(key, 0, now - windowMs);
// Get current count
pipeline.zcard(key);
// Set expiration
pipeline.expire(key, Math.ceil(windowMs / 1000));
const results = await pipeline.exec();
if (!results || results.length < 3) {
throw new Error('Redis pipeline failed');
}
const currentCount = results[2][1] as number;
return currentCount <= maxRequests;
}
Notice how we use Redis pipelines to bundle operations? That’s crucial for performance—reducing round trips between our app and Redis. But what happens during Redis outages? We implement fallback logic in our middleware:
// rate-limiter.ts
import { Request, Response, NextFunction } from 'express';
export function rateLimiter(options = { windowMs: 60000, max: 100 }) {
return async (req: Request, res: Response, next: NextFunction) => {
if (!redisManager.isConnected) {
// Fail open during Redis outages
return next();
}
const key = `rate_limit:${req.ip}`;
const allowed = await slidingWindowCheck(key, options.windowMs, options.max);
if (!allowed) {
res.setHeader('Retry-After', Math.ceil(options.windowMs / 1000));
return res.status(429).send('Too many requests');
}
next();
};
}
For production, we need more than basic IP limiting. How do we handle varying limits for different user tiers? We extend our key strategy:
const key = `user:${req.user.id}:endpoint:${req.path}`;
Monitoring is equally important. We track metrics like:
- Rejection rates per endpoint
- Peak request counts
- Redis latency percentiles
This data helps us adjust limits dynamically. Ever wonder what happens when limits need to change while running? We implement hot-reloading of configuration using Redis pub/sub.
Testing requires special attention too. We simulate load with tools like Artillery:
# test.yml
config:
target: "http://localhost:3000"
phases:
- duration: 60
arrivalRate: 200
scenarios:
- flow:
- get:
url: "/api/resource"
Deployment considerations? Always:
- Use Redis Cluster for high availability
- Set memory policies to volatile-lru
- Enable persistence with AOF every second
- Monitor evicted keys metrics
After implementing this, our API errors from overload dropped by 92%. The system now gracefully handles traffic spikes while giving fair access to all users. What thresholds would make sense for your application endpoints?
If you found this walkthrough helpful, share it with your team! Have questions or improvements? Let me know in the comments—I read every one. Happy coding!