Ever had your API go down because someone decided to hammer it with a thousand requests a second? I have. It’s not a fun Monday morning. That experience is exactly why I sat down to build a robust rate limiting system. If you’re serving anything beyond a simple demo, controlling traffic isn’t just nice to have—it’s essential for keeping your service alive and fair for everyone. Let’s build one you can trust in production, using Redis and Node.js.
Think of rate limiting as a bouncer for your API. It checks each request against a set of rules: “Is this user or IP address asking for too much, too fast?” The goal is to protect your resources from abuse—be it accidental loops or deliberate attacks—while ensuring good performance for legitimate users.
So, how do we keep track of these requests across many servers? A simple in-memory counter won’t work in a distributed setup. That’s where Redis shines. It’s fast, it handles atomic operations perfectly, and its ability to automatically expire keys makes it ideal for tracking time windows. Have you considered what happens to your user experience if your rate limiter adds too much latency?
Let’s start with the Token Bucket algorithm, a classic and intuitive approach. Imagine a bucket that holds tokens. It fills up at a steady rate. Each API request takes one token out. If the bucket is empty, the request has to wait. This method even allows for short bursts of traffic, which mirrors real-world usage.
Here’s a practical look at setting up the project structure and a core type definition.
// Define the shape of our rate limiting rules
export interface RateLimitConfig {
windowMs: number; // e.g., 60000 for a 1-minute window
maxRequests: number; // Maximum allowed in that window
}
Now, let’s implement that Token Bucket logic in Redis. We need operations to be atomic—meaning no other process can interrupt our check-and-update sequence. Redis Lua scripts are perfect for this.
// Lua script executed atomically in Redis
const tokenBucketScript = `
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local now = tonumber(ARGV[2])
local requested = 1
local bucket = redis.call('HMGET', key, 'tokens', 'lastFill')
local tokens = tonumber(bucket[1]) or capacity
local lastFill = tonumber(bucket[2]) or now
-- Calculate how many new tokens have been added since last time
local timePassed = now - lastFill
local refillAmount = timePassed / 1000 -- Let's say we add 1 token per second
tokens = math.min(capacity, tokens + refillAmount)
local allowed = tokens >= requested
if allowed then
tokens = tokens - requested
end
-- Save the new state back to Redis
redis.call('HMSET', key, 'tokens', tokens, 'lastFill', now)
redis.call('EXPIRE', key, 120) // Keep the key around for a bit
if allowed then return 1 else return 0 end
`;
But what if you need more precision, where a burst at the end of one minute and the start of the next shouldn’t slip through? This is where a Sliding Window approach is stronger. Instead of fixed blocks of time, it checks the actual requests in a moving time frame. It’s like looking at the last 60 seconds from any given moment.
The core idea is to store a timestamp for each request, then count how many are within the current window. We can use a Redis sorted set for this, which makes cleaning up old requests efficient.
// Key part of a sliding window check using a Redis sorted set
async function checkSlidingWindow(userId: string, windowMs: number, maxReq: number) {
const key = `limit:${userId}`;
const now = Date.now();
const windowStart = now - windowMs;
// Remove all timestamps older than our window
await redis.zremrangebyscore(key, 0, windowStart);
// Count how many requests are left (within the window)
const currentCount = await redis.zcard(key);
if (currentCount >= maxReq) {
return { allowed: false };
}
// Add this new request's timestamp
await redis.zadd(key, now, now);
await redis.expire(key, windowMs / 1000 + 1); // Clean up eventually
return { allowed: true, remaining: maxReq - currentCount - 1 };
}
Turning this logic into Express middleware makes it easy to protect any route. The middleware generates a key (like a user ID or IP address), checks it against our Redis limiter, and either passes the request along or sends a 429 Too Many Requests response.
What do you think is the most common mistake when first adding rate limits? It’s often forgetting to communicate the limits back to the client. A good practice is to include helpful headers in every response.
// Example middleware structure
app.use(async (req, res, next) => {
const identifier = req.ip; // Or use a user ID from a token
const result = await slidingWindowLimiter.check(identifier);
if (!result.allowed) {
res.setHeader('Retry-After', result.retryAfterSeconds);
return res.status(429).send('Too Many Requests');
}
// Tell the client how they're doing
res.setHeader('X-RateLimit-Remaining', result.remainingRequests);
res.setHeader('X-RateLimit-Reset', result.resetTime);
next();
});
Finally, remember that no system is fire-and-forget. You need to monitor it. How many 429s are you logging? Is the limit too strict for a particular endpoint? Use your metrics to adjust the windowMs and maxRequests for different parts of your app. Start stricter than you think you need; it’s easier to relax a limit than to tighten it after users expect unlimited access.
Building this was a game-changer for my applications’ stability. It moved a critical piece of infrastructure from a worrying vulnerability to a managed, understood component. Give it a try, tweak the parameters for your own traffic patterns, and see the difference it makes.
Did this guide help you understand how to keep your API safe and fast? If you found it useful, please share it with a fellow developer who might be facing similar scaling challenges. I’d also love to hear about your experiences or any clever twists you’ve added—drop a comment below!