I’ve been thinking about how modern applications handle complexity lately. As systems grow from monoliths to distributed architectures, communication between services becomes both crucial and challenging. That’s what led me to explore event-driven microservices with TypeScript, NestJS, and Apache Kafka. The combination offers a powerful way to build scalable, resilient systems that can evolve without breaking.
Traditional request-response patterns often create tight coupling between services. When one service needs to call another directly, they become interdependent. This dependency chain can cause cascading failures and makes systems harder to change. Event-driven architecture flips this model on its head. Instead of services calling each other, they emit events that other services can react to independently.
Why does this matter for your applications? Imagine an e-commerce platform where placing an order triggers multiple actions: reserving inventory, processing payment, and sending confirmation emails. In a synchronous system, if the email service is slow, the entire order process stalls. With events, each step happens independently, making the system more resilient and responsive.
TypeScript brings type safety to this equation. When events are just plain objects, it’s easy for services to misinterpret data or break when schemas change. But with TypeScript, we can define event contracts that are checked at compile time. This catches errors before they reach production.
NestJS provides a solid foundation for building these services. Its modular structure and dependency injection make it easy to organize code and manage dependencies. The framework’s built-in support for microservices and event patterns means we’re not starting from scratch. We get battle-tested patterns right out of the box.
Apache Kafka acts as the central nervous system for our events. It’s not just a message queue; it’s a distributed event streaming platform that guarantees message order and provides durability. Events in Kafka are stored and can be replayed, which is invaluable for debugging and building new features that need historical data.
Have you ever wondered how to ensure that events between services remain compatible over time? This is where schema validation becomes critical. Using libraries like Zod, we can define strict contracts for our events. These schemas validate data at runtime and can be shared between services to maintain consistency.
Let me show you how this comes together in practice. Here’s a simple event schema definition:
// shared-lib/src/events/order-events.ts
import { z } from 'zod';
export const OrderCreatedSchema = z.object({
eventType: z.literal('ORDER_CREATED'),
eventVersion: z.literal('1.0'),
orderId: z.string().uuid(),
customerId: z.string().uuid(),
items: z.array(z.object({
productId: z.string().uuid(),
quantity: z.number().int().positive(),
unitPrice: z.number().positive(),
})),
totalAmount: z.number().positive(),
timestamp: z.date(),
});
export type OrderCreatedEvent = z.infer<typeof OrderCreatedSchema>;
This schema ensures that every ORDER_CREATED event has the required fields with proper validation. The eventVersion field allows us to evolve the schema while maintaining backward compatibility.
In our NestJS service, we can use this schema to validate incoming events:
// order-service/src/events/order-consumer.service.ts
@Injectable()
export class OrderConsumerService {
constructor(private orderService: OrderService) {}
@EventPattern('ORDER_CREATED')
async handleOrderCreated(event: unknown) {
const result = OrderCreatedSchema.safeParse(event);
if (!result.success) {
console.error('Invalid event received:', result.error);
// Send to dead letter queue for investigation
return;
}
await this.orderService.processOrder(result.data);
}
}
This approach gives us confidence that we’re working with valid data. The safeParse method returns a result object that tells us whether validation succeeded without throwing exceptions. Failed validations can be routed to a dead letter queue for analysis rather than crashing the service.
What happens when services need to coordinate across multiple events? This is where patterns like Saga come into play. Instead of using distributed transactions, which are complex and often problematic, we can use a series of events to manage multi-step processes.
Consider an order fulfillment process. When an order is created, we need to reserve inventory, process payment, and then update the order status. Each step emits events that trigger the next action. If any step fails, compensating events can reverse previous actions.
Here’s how we might handle payment processing:
// payment-service/src/events/payment-saga.service.ts
@Injectable()
export class PaymentSagaService {
constructor(private paymentService: PaymentService) {}
@EventPattern('INVENTORY_RESERVED')
async handleInventoryReserved(event: InventoryReservedEvent) {
try {
const payment = await this.paymentService.processPayment({
orderId: event.orderId,
amount: event.totalAmount,
customerId: event.customerId,
});
await this.eventProducer.publishPaymentProcessed({
orderId: event.orderId,
paymentId: payment.id,
status: payment.status,
});
} catch (error) {
await this.eventProducer.publishPaymentFailed({
orderId: event.orderId,
reason: error.message,
});
}
}
}
This pattern keeps services loosely coupled. The payment service doesn’t need to know about inventory management, and vice versa. Each service focuses on its specific domain while reacting to events from other parts of the system.
Monitoring becomes essential in distributed systems. How do you know if events are being processed correctly? NestJS integrates well with observability tools like OpenTelemetry. We can add tracing to track events as they flow through the system:
// shared-lib/src/telemetry/tracing.ts
import { trace } from '@opentelemetry/api';
export function withTrace<T>(
name: string,
fn: (span: Span) => Promise<T>
): Promise<T> {
const tracer = trace.getTracer('ecommerce');
return tracer.startActiveSpan(name, async (span) => {
try {
return await fn(span);
} finally {
span.end();
}
});
}
This tracing wrapper helps us understand the flow of events and identify bottlenecks. When an order takes too long to process, we can trace the events through each service to see where the delay occurs.
Testing event-driven systems requires a different approach. Instead of testing direct method calls, we need to verify that services emit the right events and react appropriately to incoming events. NestJS provides excellent testing utilities for this:
// order-service/test/order.service.spec.ts
describe('OrderService', () => {
let service: OrderService;
let eventProducer: EventProducerService;
beforeEach(async () => {
const module = await Test.createTestingModule({
providers: [OrderService, EventProducerService],
}).compile();
service = module.get<OrderService>(OrderService);
eventProducer = module.get<EventProducerService>(EventProducerService);
});
it('should publish ORDER_CREATED event when order is created', async () => {
const publishSpy = jest.spyOn(eventProducer, 'publishOrderCreated');
await service.createOrder('customer-123', [
{ productId: 'prod-123', quantity: 2, unitPrice: 25, totalPrice: 50 }
]);
expect(publishSpy).toHaveBeenCalledWith(
expect.objectContaining({
eventType: 'ORDER_CREATED',
customerId: 'customer-123',
})
);
});
});
This test verifies that our service emits the expected event when an order is created. We’re testing the behavior rather than the implementation details.
Deployment considerations are different for event-driven systems. Since services are independent, they can be deployed separately. Kafka’s consumer groups allow us to scale horizontally by adding more instances of a service. Each instance processes a subset of the partitions, distributing the load.
Configuration management becomes important. Each service needs access to Kafka brokers, schema registry URLs, and other infrastructure details. Using environment variables and configuration modules in NestJS helps keep these settings organized and secure.
Error handling requires special attention. What happens when a service fails to process an event? Kafka’s commit mechanism allows us to control when an event is marked as processed. We can configure retry policies and dead letter queues for events that consistently fail.
The beauty of this architecture is its flexibility. New services can be added without modifying existing ones. If we want to add a recommendation service that analyzes purchasing patterns, it can simply consume the ORDER_CREATED events without affecting the order processing flow.
As you build your own event-driven systems, remember that the goal is not just technical excellence but business agility. Systems that can evolve quickly and handle failure gracefully provide real competitive advantages.
What challenges have you faced with microservice communication? I’d love to hear about your experiences and solutions. If this approach resonates with you, please share your thoughts in the comments below. Your feedback helps shape future discussions and deepens our collective understanding of these patterns.