Crafting Resilient Microservices: A Practical Journey with NestJS, RabbitMQ, and MongoDB
I’ve spent countless nights debugging distributed systems where services failed to communicate properly. That pain led me here—to share a battle-tested approach for production-grade event-driven architectures. When payments fail after orders succeed, or user updates vanish into the void, you realize how crucial these patterns are. Let’s build something that won’t keep you up at 3 AM.
Our e-commerce system uses three core services: Users, Orders, and Payments. Each service owns its data and communicates through events. When a user signs up, the User Service emits a UserCreatedEvent
. The Order Service listens and prepares to receive orders from that user. This separation prevents the entire system from collapsing if one component fails. Have you considered what happens when a payment service goes offline during a sale?
We organize our code in a monorepo using NestJS CLI. This keeps shared libraries consistent while allowing independent deployment. The structure looks like:
npx @nestjs/cli new microservices-architecture
cd microservices-architecture
npx @nestjs/cli generate app user-service
npx @nestjs/cli generate library shared # For event definitions
RabbitMQ handles our messaging with durable queues. Notice the x-max-retries
argument—this automatically requeues failed messages:
// Inside RabbitMQ configuration
queueOptions: {
durable: true,
arguments: {
'x-message-ttl': 60000,
'x-max-retries': 3 // Auto-retry up to 3 times
}
}
The User Service demonstrates event publishing. When a profile updates, it notifies others without waiting for responses:
// user.controller.ts
@Put(':id')
async updateUser(@Param('id') id: string, @Body() dto: UpdateUserDto) {
const updatedUser = await this.userService.update(id, dto);
await this.eventBus.publishEvent('order_service',
'USER_UPDATED',
new UserUpdatedEvent(id, dto));
return updatedUser;
}
Now, the Order Service. It listens for events and starts order processing. But what if payment fails after inventory is reserved? That’s where Sagas save us. We implement a choreography-based Saga where each service emits events triggering the next step:
// order.saga.ts
@OnEvent('ORDER_CREATED')
async handleOrderCreated(event: OrderCreatedEvent) {
const paymentResult = await this.paymentClient.send(
'PROCESS_PAYMENT',
{ orderId: event.orderId, amount: event.totalAmount }
);
if (paymentResult.status === 'success') {
this.eventBus.emit('PAYMENT_SUCCEEDED', event);
} else {
this.eventBus.emit('COMPENSATE_ORDER', event); // Rollback!
}
}
For resilience, we add dead letter exchanges in RabbitMQ. Messages failing all retries route to a separate queue for inspection:
// Error handling in RabbitMQ setup
queueOptions: {
deadLetterExchange: 'failed_events',
deadLetterRoutingKey: 'orders.failed'
}
Monitoring is non-negotiable. We instrument services with Winston and expose health checks:
// health.controller.ts
@Get('health')
@HealthCheck()
check() {
return this.health.check([
() => this.mongoose.pingCheck('mongo'),
() => this.rabbitmq.pingCheck('rabbitmq')
]);
}
Testing strategies include contract tests for events. We validate that payloads match expectations across services:
// shared/events/order.events.spec.ts
it('OrderCreatedEvent should have required fields', () => {
const event = new OrderCreatedEvent('order_123', 'user_456', [], 99.99);
expect(event).toHaveProperty('orderId');
expect(event).toHaveProperty('totalAmount');
});
Deployment uses Docker with readiness probes. Kubernetes checks /health
before routing traffic:
# payment-service/Dockerfile
FROM node:18-alpine
EXPOSE 3000
HEALTHCHECK --interval=30s CMD curl -f http://localhost:3000/health || exit 1
I’ve seen systems fail from ignoring transaction rollbacks and monitoring. What’s your plan for when a database partition happens during checkout? Build with failure as an expectation. If this approach saves you one production incident, share it with a teammate who’s facing similar challenges. Your thoughts? Leave a comment about your toughest microservices hurdle.