I’ve been thinking a lot lately about how to protect APIs from abuse while maintaining performance and reliability. It’s one of those foundational pieces that can make or break a service, especially as your user base grows. If you’re building anything that’s exposed to the web, you need to think about rate limiting—not just as a feature, but as a core part of your infrastructure.
Have you ever wondered how services like Twitter or GitHub handle millions of requests without collapsing under load? A big part of the answer lies in intelligent, well-implemented rate limiting. It’s not just about saying “no” to too many requests—it’s about saying “yes” in a controlled, sustainable way.
Let’s start with a simple but powerful idea: the token bucket algorithm. This method allows for bursts of traffic while still enforcing a long-term average rate. Here’s a basic implementation using Redis and Express:
const rateLimiter = async (req, res, next) => {
const userId = req.user.id;
const key = `rate_limit:${userId}`;
const now = Date.now();
const refillRate = 1000 / 10; // 10 tokens per second
const capacity = 20;
const result = await redis.multi()
.hgetall(key)
.hset(key, 'lastRefill', now)
.hset(key, 'tokens', Math.min(capacity, (tokens || capacity) + refillRate * (now - lastRefill)))
.expire(key, 60)
.exec();
if (result.tokens < 1) {
return res.status(429).json({ error: 'Too many requests' });
}
await redis.hincrby(key, 'tokens', -1);
next();
};
But what happens when you need something more precise? That’s where sliding windows come in. Instead of resetting at fixed intervals, they continuously evaluate the request count over a moving timeframe. This avoids the “burst at the window edge” problem that can still overwhelm your system.
How do you handle different limits for different types of users or endpoints? You might want to allow more requests for paid users or stricter limits on expensive operations. Here’s one way to structure that:
const slidingWindowLimiter = async (key, windowMs, maxRequests) => {
const now = Date.now();
const windowStart = now - windowMs;
await redis.zremrangebyscore(key, 0, windowStart);
const requestCount = await redis.zcard(key);
if (requestCount >= maxRequests) {
return false;
}
await redis.zadd(key, now, `${now}-${Math.random()}`);
await redis.expire(key, windowMs / 1000);
return true;
};
Error handling is just as important as the logic itself. If Redis goes down, you don’t want your API to become a free-for-all. A common approach is to fail open or closed based on your risk tolerance, but always log the incident.
What about monitoring and analytics? Rate limiting isn’t just a defensive mechanism—it’s also a source of insight. By tracking which clients hit limits and when, you can detect anomalies, plan capacity, and even identify potential API misuse patterns before they become critical.
Here’s a practical middleware example that integrates logging and dynamic limits:
app.use(async (req, res, next) => {
const identifier = req.ip || req.user.id;
const endpoint = req.path;
const limit = getLimitForEndpoint(endpoint);
const isAllowed = await checkRateLimit(identifier, endpoint, limit);
if (!isAllowed) {
logRateLimitHit(identifier, endpoint);
return res.status(429).header('Retry-After', '60').json({ error: 'Rate limit exceeded' });
}
next();
});
Deploying this in production requires more than just code—it requires thought. How will you test under load? How will you adjust limits without redeploying? Using environment variables or a configuration service can help you stay flexible.
Remember, the goal isn’t to eliminate traffic—it’s to shape it. Good rate limiting feels invisible to legitimate users while stopping bad actors in their tracks. It’s a balance between security, usability, and performance.
I hope this guide gives you a solid starting point for building your own rate limiting system. Have you run into interesting challenges or solutions around this? I’d love to hear your thoughts—feel free to share your experiences in the comments below.