I’ve been thinking about a problem that keeps many developers awake at night. When you break a large application into smaller, independent services, you gain flexibility and scalability. But you lose something crucial—the ability to manage transactions across different services. What happens when one service succeeds and another fails? How do you ensure data consistency without a shared database? This question led me to explore a solution that has changed how I build reliable systems.
Let me show you what I mean. Imagine you’re building an online store. A customer places an order. Your system needs to create an order, check inventory, process payment, and schedule shipping. In a traditional application, this happens within a single database transaction. If anything fails, everything rolls back. But in a microservices world, each step lives in a different service with its own database.
What happens if the payment service is down after inventory has been reserved? The customer’s items are stuck in limbo, and your inventory counts are wrong. This is where the Saga pattern comes in. Instead of trying to make everything succeed at once, it breaks the process into steps. Each step has a way to undo itself if something goes wrong later.
I’ll walk you through building this pattern in Node.js with TypeScript. We’ll create a system that handles failures gracefully and keeps your data consistent. Ready to see how it works?
First, let’s set up our project. We’ll create separate services for orders, inventory, payments, and shipping. Each service will manage its own data and communicate through events. This separation is key—it means services can fail independently without bringing down the whole system.
// Our project structure
saga-pattern-demo/
├── order-service/
├── inventory-service/
├── payment-service/
├── shipping-service/
└── shared-types/
Each service needs its own database connection. This might seem like extra work, but it’s what makes microservices resilient. If the payment database has issues, orders can still be created and inventory can still be checked.
Now, let’s define what a Saga actually is. Think of it as a story with chapters. Each chapter represents a step in your business process. If we can’t finish the story, we need to go back and undo the chapters we’ve already written. In technical terms, a Saga is a sequence of local transactions. Each transaction updates data in one service and publishes an event.
Here’s a simple example of what a Saga step looks like:
interface SagaStep {
stepId: string;
service: string;
action: string;
compensation: string;
status: 'pending' | 'completed' | 'failed';
}
The compensation field is crucial. It tells us how to undo this step if needed. For inventory reservation, the compensation would be releasing the reserved items. For payment processing, it would be issuing a refund.
There are two main ways to implement Sagas. The first is choreography, where services listen for events and react to them. It’s like a dance where each dancer knows their moves. The second is orchestration, where a central conductor tells everyone what to do. Which approach do you think works better for complex workflows?
I prefer orchestration for most cases. It gives you a clear view of what’s happening and makes debugging easier. Let’s build an orchestrator that manages our order process.
class SagaOrchestrator {
private steps: SagaStep[] = [];
private currentStep = 0;
async execute() {
while (this.currentStep < this.steps.length) {
const step = this.steps[this.currentStep];
try {
await this.executeStep(step);
this.currentStep++;
} catch (error) {
await this.compensate();
break;
}
}
}
private async compensate() {
for (let i = this.currentStep - 1; i >= 0; i--) {
await this.executeCompensation(this.steps[i]);
}
}
}
Notice how compensation happens in reverse order. If step three fails, we compensate step two, then step one. This ensures we clean up properly.
Now let’s look at a real example. When a customer places an order, here’s what happens:
- Order service creates an order record
- Inventory service reserves the items
- Payment service charges the customer
- Shipping service schedules delivery
Each step depends on the previous one succeeding. But what if the payment fails after inventory is reserved? The Saga orchestrator would:
- Notice the payment failure
- Tell inventory service to release the reserved items
- Mark the order as cancelled
This approach gives us eventual consistency. The system might be in an intermediate state for a short time, but it will eventually reach a consistent state.
Let’s implement the order service. It needs to start the Saga when a new order arrives:
class OrderService {
async createOrder(orderData: OrderData) {
// Create order in local database
const order = await this.saveOrder(orderData);
// Start the Saga
const sagaId = await this.orchestrator.startSaga({
type: 'order-creation',
payload: { orderId: order.id, ...orderData }
});
return { order, sagaId };
}
}
The order service doesn’t know about inventory or payments. It just starts the process and lets the orchestrator handle the rest. This separation of concerns makes each service easier to maintain.
Now, how do services communicate? We’ll use a message queue. I like RabbitMQ for this, but you could use Kafka or AWS SQS. The important thing is that messages are durable—they won’t be lost if a service restarts.
class EventPublisher {
async publish(event: SagaEvent) {
await this.channel.sendToQueue(
'saga-events',
Buffer.from(JSON.stringify(event)),
{ persistent: true }
);
}
}
Each service listens for events relevant to its role. The inventory service listens for “order-created” events. When it receives one, it tries to reserve inventory. If successful, it publishes an “inventory-reserved” event. If failed, it publishes an “inventory-reservation-failed” event.
But here’s an important question: what happens if we publish the same event twice? Network issues can cause duplicate messages. We need to make our operations idempotent—meaning doing them twice has the same effect as doing them once.
class InventoryService {
private processedEvents = new Set<string>();
async handleReservation(event: ReservationEvent) {
// Check if we've already processed this event
if (this.processedEvents.has(event.id)) {
return; // Already processed
}
// Process the reservation
await this.reserveItems(event.payload);
// Remember we processed it
this.processedEvents.add(event.id);
}
}
We store event IDs in a database or cache to track what we’ve processed. This prevents double-charging customers or double-reserving inventory.
Now let’s talk about failure handling. Services can fail in different ways. They might crash, they might be slow, or they might return errors. Our Saga needs to handle all these cases.
One approach is to use timeouts. If a service doesn’t respond within a certain time, we consider the step failed and start compensation. But we need to be careful—what if the service was just slow and completes after we’ve already compensated?
This is where idempotent compensation helps. If we try to release inventory that’s already been released, nothing happens. The operation is safe to retry.
Let’s look at the payment service. Processing payments is critical—we need to be extra careful here:
class PaymentService {
async processPayment(paymentData: PaymentData) {
// Check if payment was already processed
const existing = await this.findPayment(paymentData.reference);
if (existing) {
return existing;
}
// Call payment gateway
const result = await this.gateway.charge(paymentData);
// Save result
await this.savePayment(result);
return result;
}
async refundPayment(paymentId: string) {
// Check if already refunded
const payment = await this.findPaymentById(paymentId);
if (payment.status === 'refunded') {
return payment;
}
// Process refund
const result = await this.gateway.refund(paymentId);
// Update status
payment.status = 'refunded';
await payment.save();
return result;
}
}
Notice how both charge and refund operations check their current state before acting. This makes them safe to retry.
What about monitoring? We need to know when Sagas fail or get stuck. I add logging at each step:
class LoggingOrchestrator extends SagaOrchestrator {
async executeStep(step: SagaStep) {
console.log(`Starting step: ${step.action}`);
const startTime = Date.now();
try {
await super.executeStep(step);
const duration = Date.now() - startTime;
console.log(`Completed step: ${step.action} in ${duration}ms`);
} catch (error) {
console.error(`Failed step: ${step.action}`, error);
throw error;
}
}
}
We can also store Saga state in a database. This lets us see which Sagas are in progress, which have completed, and which have failed. It’s invaluable for debugging production issues.
Now, let me share something I learned the hard way. When I first implemented Sagas, I didn’t think about concurrent updates. What if two Sagas try to reserve the same inventory item at the same time?
We need to handle this at the database level. MongoDB has findAndModify, PostgreSQL has SELECT FOR UPDATE. Use these features to prevent race conditions.
async reserveItem(productId: string, quantity: number) {
const result = await Inventory.findOneAndUpdate(
{ productId, quantity: { $gte: quantity } },
{ $inc: { quantity: -quantity } },
{ new: true }
);
if (!result) {
throw new Error('Insufficient inventory');
}
return result;
}
This operation is atomic—it either succeeds or fails completely. No race conditions.
Another challenge: what happens when a service is down for maintenance? Our Saga might fail, but we want to retry when the service comes back. We need a way to resume interrupted Sagas.
I store Saga state in a database with all the steps and their status. When the orchestrator starts, it checks for incomplete Sagas and tries to continue them.
async recoverIncompleteSagas() {
const incomplete = await Saga.find({
status: { $in: ['started', 'in_progress'] },
updatedAt: { $lt: new Date(Date.now() - 5 * 60 * 1000) }
});
for (const saga of incomplete) {
await this.continueSaga(saga);
}
}
This runs periodically to clean up stuck Sagas. The five-minute delay prevents us from retrying too soon.
Let’s put everything together. Here’s a complete example of an order creation Saga:
async function createOrderSaga(orderData: OrderData) {
const saga = {
id: generateId(),
type: 'order-creation',
steps: [
{ service: 'order', action: 'create' },
{ service: 'inventory', action: 'reserve' },
{ service: 'payment', action: 'process' },
{ service: 'shipping', action: 'schedule' }
],
status: 'started'
};
await this.saveSaga(saga);
try {
// Execute each step
for (const step of saga.steps) {
await this.executeStep(saga.id, step);
}
saga.status = 'completed';
} catch (error) {
saga.status = 'failed';
await this.compensateSaga(saga.id);
}
await this.updateSaga(saga);
}
This pattern has saved me countless hours of debugging. It turns messy distributed transactions into manageable sequences of steps. Each step is simple and testable. The overall flow is clear and maintainable.
But here’s something to think about: when should you use Sagas versus other patterns? Sagas work well for business processes that have clear compensation actions. If you can’t undo a step (like sending an email), you might need a different approach.
I’ve found Sagas most useful for order processing, user registration, and data migration. Anywhere you need to coordinate multiple services while maintaining data consistency.
Remember, the goal isn’t perfection. It’s building systems that fail gracefully and recover automatically. With Sagas, you can handle failures without manual intervention. Your system becomes more resilient and easier to operate.
What challenges have you faced with distributed transactions? Have you tried other patterns that worked well? I’d love to hear about your experiences.
If this approach makes sense for your projects, give it a try. Start with a simple two-step Saga and expand from there. The pattern scales well as your system grows.
I hope this gives you a practical way to handle distributed transactions. It’s made my systems more reliable and my life easier. Share this with your team if you think it would help them too. Let me know in the comments what you build with it!
As a best-selling author, I invite you to explore my books on Amazon. Don’t forget to follow me on Medium and show your support. Thank you! Your support means the world!
101 Books
101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.
Check out our book Golang Clean Code available on Amazon.
Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!
📘 Checkout my latest ebook for free on my channel!
Be sure to like, share, comment, and subscribe to the channel!
Our Creations
Be sure to check out our creations:
Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | JS Schools
We are on Medium
Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva