I’ve been designing distributed systems for years, and one problem keeps resurfacing: how to maintain data consistency across services during complex transactions. Just last month, I saw an e-commerce platform lose orders because their inventory system went down during a sale. That’s when I decided to document a robust solution using event-driven architecture. Let me show you how to build resilient microservices with NestJS, RabbitMQ, and MongoDB.
When an order flows through your system, multiple actions must happen reliably: inventory reservation, payment processing, and customer notifications. If any step fails, the entire transaction should recover gracefully. This is where event-driven patterns shine. We’ll create three independent services communicating through events:
- Order Service (HTTP interface + event emission)
- Inventory Service (event listener + business logic)
- Notification Service (event listener + external comms)
First, our workspace setup:
mkdir event-driven-microservices
cd event-driven-microservices
mkdir order-service inventory-service notification-service shared
Each service needs core dependencies:
# Inside each service directory
npm init -y
npm install @nestjs/{core,common,microservices,mongoose}
npm install amqplib amqp-connection-manager mongoose
npm install -D typescript @types/node
Our RabbitMQ configuration establishes the messaging backbone:
// shared/config/rabbitmq.config.ts
export const RABBITMQ_CONFIG = {
transport: Transport.RMQ,
options: {
urls: ['amqp://localhost:5672'],
queue: 'events_queue',
queueOptions: {
durable: true,
arguments: {
'x-message-ttl': 60000,
'x-dead-letter-exchange': 'dlx'
}
}
}
};
// Dead Letter Exchange for failed messages
export const DLX_CONFIG = {
exchange: 'dlx.exchange',
routingKey: '#'
};
Now, the Order Service creates orders and emits events:
// order-service/src/order/order.service.ts
@Injectable()
export class OrderService {
constructor(
@InjectModel(Order.name) private orderModel: Model<OrderDocument>,
private client: ClientProxy
) {}
async createOrder(orderData) {
const order = await this.orderModel.create({
...orderData,
status: 'PENDING'
});
this.client.emit('order_created', {
orderId: order.id,
items: order.items
});
return order;
}
}
What happens when the inventory service receives this event? It attempts to reserve items:
// inventory-service/src/inventory/inventory.controller.ts
@EventPattern('order_created')
async handleOrderCreated(@Payload() data) {
try {
const reserved = await this.inventoryService.reserveItems(data.items);
if (reserved) {
this.client.emit('inventory_reserved', {
orderId: data.orderId,
reservationId: generateUUID()
});
} else {
this.client.emit('inventory_failed', {
orderId: data.orderId,
reason: 'Insufficient stock'
});
}
} catch (error) {
// Exponential backoff retry logic here
}
}
But how do we handle failures? Dead Letter Queues (DLX) save undeliverable messages:
// Global exception filter for DLX routing
@Catch()
export class RabbitMQErrorFilter implements ExceptionFilter {
catch(exception: Error, host: ArgumentsHost) {
const ctx = host.switchToRpc();
const channel = ctx.getContext().getChannelRef();
const originalMsg = ctx.getData();
channel.publish('dlx.exchange', '', originalMsg.content, {
headers: { 'x-death': originalMsg.fields.routingKey }
});
channel.ack(originalMsg);
}
}
Event sourcing gives us an audit trail for recovery:
// Shared event schema
@Schema({ timestamps: true })
export class EventLog {
@Prop({ required: true })
eventType: string;
@Prop({ type: Map })
payload: Record<string, any>;
}
For testing, we simulate network partitions:
// Test case: Inventory service offline
it('should retry failed inventory checks', async () => {
mockRabbitDown();
const order = await orderService.createOrder(testOrder);
await wait(5000); // Allow retries
mockRabbitUp();
const updatedOrder = await orderService.getOrder(order.id);
expect(updatedOrder.status).toEqual('CONFIRMED');
});
Monitoring is non-negotiable. We track:
- Message throughput per queue
- Event processing latency
- Dead letter queue growth
- Service health metrics
Common pitfalls I’ve encountered:
- Forgetting idempotency in event handlers
- Underestimating queue memory requirements
- Missing distributed tracing correlations
- Ignoring versioning in event schemas
What separates robust systems from fragile ones? Anticipating failure at every integration point. By implementing retries with exponential backoff, dead letter queues, and comprehensive monitoring, we create systems that withstand real-world chaos.
This pattern has saved countless transactions in my production systems. If you implement just one technique from this article, make it the dead letter queue configuration - it’s your safety net when things go wrong. Have you considered how you’d extend this pattern to payment processing?
What challenges have you faced with distributed transactions? Share your experiences in the comments below. If this approach resonates with you, like and share this article so others can build more resilient systems too.