Crafting Resilient Microservices: My Journey with NestJS, Redis, and RabbitMQ
As I recently scaled a fintech platform, I faced cascading failures from synchronous API calls during peak traffic. That pain point sparked my exploration into event-driven microservices. Today, I’ll share practical insights for building production-ready systems using NestJS, Redis, and RabbitMQ. Stick with me - I’ll show you how to avoid the mistakes I made.
When services communicate asynchronously, failures become isolated incidents rather than system-wide catastrophes. Consider this: what happens when your order processing service goes down? In synchronous architectures, everything halts. But with event-driven patterns, messages queue up until services recover.
Let’s start with our environment setup. You’ll need Node.js v18+, Docker, and basic NestJS knowledge. Here’s how I structure projects:
# Project initialization
mkdir microservices-ecommerce && cd microservices-ecommerce
npm init -y
npm install -D concurrently
mkdir -p services/{api-gateway,user-service,order-service,notification-service}
The magic happens when services emit events instead of direct calls. Here’s a user creation snippet from our User Service:
// User Service - Event emission
async createUser(createUserDto: CreateUserDto) {
const user = await this.userRepository.save(createUserDto);
this.eventEmitter.emit('user.created', { // Emit event
userId: user.id,
email: user.email
});
return user;
}
Notice how we’re not calling other services directly? The Order Service independently consumes these events:
// Order Service - Event listener
@OnEvent('user.created')
async handleUserCreated(payload: UserCreatedEvent) {
// Create cart for new user
await this.cartService.createCart(payload.userId);
console.log(`Cart created for user ${payload.userId}`);
}
But how do we handle communication between physically separated services? Enter RabbitMQ. I configure it as our message broker using Docker Compose:
# docker/docker-compose.yml
services:
rabbitmq:
image: rabbitmq:3-management
ports:
- "5672:5672"
- "15672:15672"
In NestJS, we connect using the RabbitMQ transport:
// Order Service bootstrap
const app = await NestFactory.createMicroservice(OrderModule, {
transport: Transport.RMQ,
options: {
urls: ['amqp://localhost:5672'],
queue: 'order_queue',
queueOptions: { durable: true }
}
});
What about caching frequent database queries? Redis solves this elegantly. Here’s a user lookup with caching:
// User Service with Redis caching
async getUserById(id: string) {
const cachedUser = await this.redisClient.get(`user:${id}`);
if (cachedUser) return JSON.parse(cachedUser);
const user = await this.userRepository.findOne({ where: { id } });
if (!user) throw new NotFoundException();
await this.redisClient.set(`user:${id}`, JSON.stringify(user), 'EX', 60); // 60s cache
return user;
}
For session management, Redis outperforms database storage. Configure it with NestJS sessions:
// API Gateway main.ts
app.use(session({
store: new RedisStore({ client: redisClient }),
secret: process.env.SESSION_SECRET,
resave: false,
saveUninitialized: false
}));
Production readiness demands health checks and graceful shutdowns. I implement both:
// Health check endpoint
@Controller('health')
export class HealthController {
@Get()
healthCheck() {
return { status: 'up', timestamp: new Date() };
}
}
// Graceful shutdown
app.enableShutdownHooks();
app.use((req, res, next) => {
if (!app.isShuttingDown) return next();
res.set('Connection', 'close');
res.status(503).send('Server shutting down');
});
Testing strategies differ from monoliths. I focus on contract testing between services using Pact:
// Order service contract test
describe('Order Service Pact', () => {
beforeAll(() => provider.setup());
afterEach(() => provider.verify());
afterAll(() => provider.finalize());
test('user creation event contract', async () => {
await provider.addInteraction({
state: 'a user created event',
uponReceiving: 'a user created event',
withRequest: { method: 'POST', path: '/events' },
willRespondWith: { status: 200 }
});
});
});
Deployment requires containerization. Here’s a lean Dockerfile pattern I use:
# Node service Dockerfile
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY dist ./dist
CMD ["node", "dist/main"]
In Kubernetes, I add liveness probes for auto-recovery:
# Kubernetes deployment
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 10
periodSeconds: 5
Monitoring requires distributed tracing. I use OpenTelemetry with Jaeger:
// Tracing setup
import { NodeSDK } from '@opentelemetry/sdk-node';
const sdk = new NodeSDK({
traceExporter: new JaegerExporter(),
instrumentations: [getNodeAutoInstrumentations()]
});
sdk.start();
Common pitfalls? Message ordering tripped me up initially. RabbitMQ guarantees per-queue order, but multiple consumers can process out-of-sequence. I solved this with:
- Single consumer per logical entity
- Versioned events with conflict resolution
- Idempotent handlers that tolerate duplicates
Another gotcha: over-caching. I once cached user data too aggressively, causing stale financial data displays. Now I:
- Set conservative TTLs (30-60 seconds)
- Use cache invalidation hooks on writes
- Implement cache-aside patterns rigorously
After implementing these patterns, our system handled 5x traffic spikes without downtime. The true win? Deploying order service updates without touching user management.
What challenges have you faced with distributed systems? I’d love to hear your war stories. If this guide helped you, share it with a colleague - production issues become team sport at scale. Leave a comment with your implementation experiences!