js

Mastering Distributed Data Consistency: Transactions, Two-Phase Commit, and Sagas Explained

Learn how to manage data consistency across multiple databases using transactions, 2PC, and the Saga pattern in real-world systems.

Mastering Distributed Data Consistency: Transactions, Two-Phase Commit, and Sagas Explained

I’ve been thinking about a problem that keeps many developers up at night. How do you keep data consistent when it lives in different places? I recently worked on an e-commerce system where an order needed to touch three separate databases. The order record, the inventory count, and the payment details all had to be updated together, or not at all. If one update failed, the whole operation needed to be undone. This is the challenge of distributed data consistency. Let’s talk about how to solve it.

Have you ever had a user’s order go through but their payment get stuck? That’s the kind of mess we’re trying to prevent.

The traditional way databases handle this is with transactions. You start a transaction, make your changes, and then commit them. If anything goes wrong, you roll everything back. This works perfectly inside a single database. But what happens when your data is spread across multiple databases, perhaps even on different servers? That’s where things get interesting.

We need a way to coordinate. One established method is called the Two-Phase Commit protocol. Think of it like a group vote. First, we ask every database, “Can you commit these changes?” This is the prepare phase. Each database checks if it can proceed and says yes or no. Only if everyone says yes do we move to the second phase: the actual commit. If anyone says no, we tell everyone to abort.

Here’s a basic idea of how you might structure that with TypeORM, connecting to two different PostgreSQL databases.

async function twoPhaseCommit(orderData, inventoryData) {
  const orderDb = getOrderDataSource();
  const inventoryDb = getInventoryDataSource();
  let orderQueryRunner, inventoryQueryRunner;

  try {
    // Phase 1: Prepare
    orderQueryRunner = orderDb.createQueryRunner();
    await orderQueryRunner.connect();
    await orderQueryRunner.startTransaction();

    inventoryQueryRunner = inventoryDb.createQueryRunner();
    await inventoryQueryRunner.connect();
    await inventoryQueryRunner.startTransaction();

    // Try the operations
    await orderQueryRunner.manager.save(Order, orderData);
    await inventoryQueryRunner.manager.update(Product, inventoryData.id, { quantity: inventoryData.newQty });

    // If we get here, both "prepared" successfully
    // Phase 2: Commit
    await orderQueryRunner.commitTransaction();
    await inventoryQueryRunner.commitTransaction();
    console.log('Distributed transaction committed.');

  } catch (error) {
    // Something failed, rollback both
    if (orderQueryRunner && orderQueryRunner.isTransactionActive) {
      await orderQueryRunner.rollbackTransaction();
    }
    if (inventoryQueryRunner && inventoryQueryRunner.isTransactionActive) {
      await inventoryQueryRunner.rollbackTransaction();
    }
    console.error('Transaction failed, rolled back:', error);
    throw error;
  } finally {
    // Always release the connections
    if (orderQueryRunner) await orderQueryRunner.release();
    if (inventoryQueryRunner) await inventoryQueryRunner.release();
  }
}

This seems straightforward, right? But it has a major flaw. What if the network fails after the first database commits but before the second one does? The first change is permanent, but the second is not. Our data is now inconsistent. This is called a “heuristic decision,” and cleaning it up requires manual intervention, which is a nightmare.

So, if Two-Phase Commit is so fragile, what’s the alternative? This is where patterns like Saga come into play. Instead of trying to do everything at once, a Saga breaks the transaction into a series of smaller, local steps. Each step has a compensating action—a way to undo its effect if a later step fails.

Imagine booking a trip. You reserve a flight, then a hotel, then a car. If the car rental fails, you don’t just leave the flight and hotel booked. You execute the compensation: cancel the hotel, then cancel the flight. The process moves forward step-by-step, but can also unwind backward.

Let’s model a simple order Saga. We won’t use a transaction spanning databases. Instead, we’ll use a series of steps, each updating one database, and a state machine to track progress.

class OrderSaga {
  private state: 'STARTED' | 'INVENTORY_RESERVED' | 'PAYMENT_TAKEN' | 'COMPLETED' | 'FAILED';

  async execute(orderId: string) {
    this.state = 'STARTED';
    try {
      // Step 1: Reserve inventory in its own database
      await this.reserveInventory(orderId);
      this.state = 'INVENTORY_RESERVED';

      // Step 2: Process payment in its own database
      await this.processPayment(orderId);
      this.state = 'PAYMENT_TAKEN';

      // Step 3: Finalize the order
      await this.confirmOrder(orderId);
      this.state = 'COMPLETED';

    } catch (error) {
      await this.compensate(orderId);
      this.state = 'FAILED';
    }
  }

  private async compensate(orderId: string) {
    // Rollback in reverse order
    if (this.state === 'PAYMENT_TAKEN') {
      await this.refundPayment(orderId); // Compensating action
    }
    if (this.state === 'INVENTORY_RESERVED') {
      await this.releaseInventory(orderId); // Compensating action
    }
    // If state was 'STARTED', nothing was done, so nothing to undo.
  }

  // ... implementations for reserveInventory, processPayment, etc.
}

Do you see the difference? The Saga pattern accepts that temporary inconsistency is okay as long as there’s a clear path to fix it. The system is designed to recover from failure. This is often more practical in distributed systems where networks are unreliable.

But how do you keep track of all these steps and states, especially if your service crashes in the middle? You need to persist the Saga’s state. Often, this is done by storing an event or a record in a durable log. Each step and its compensation become their own small, idempotent operations. Idempotent means you can safely retry them if you’re not sure they succeeded.

Here’s a practical tip: use a unique identifier for the entire operation, like a transactionId or correlationId. Pass this ID through every step—creating the order, reserving inventory, and charging the payment. This way, you can link all the logs and events together. If something fails, you can query your system for all actions related to transactionId: 'abc-123' and see exactly what happened.

What about performance? Coordinating across databases is slow. A commit in one database might take 10 milliseconds. Adding a second database might double that. Add network latency between servers, and the time adds up. For some operations, like finalizing a financial trade, this cost is acceptable. For others, like adding an item to a shopping cart, it is not.

You need to choose the right tool for the job. Use a coordinated transaction only when absolute, immediate consistency is required. For most user-facing workflows, a Saga or an eventual consistency model is better. The user might see “Order Processing” for a few seconds while the steps complete in the background. That’s usually fine.

A common pitfall is deadlock. This happens when two operations are waiting for each other to release locks. In a single database, the database engine can detect this. Across multiple databases, detection is much harder. To avoid it, always access resources (like database records) in a consistent global order. For example, always update the inventory database before the orders database. This prevents circular waits.

Let’s look at a more complete example, combining these ideas. We’ll handle an order placement with a reservation pattern, which is common in e-commerce.

async function placeOrder(customerId: string, items: CartItem[]) {
  const transactionId = generateUUID(); // Our global ID

  // 1. Create an order in PENDING state
  const orderRepo = ordersDataSource.getRepository(Order);
  const newOrder = orderRepo.create({
    id: transactionId,
    customerId,
    items,
    status: 'PENDING',
    totalAmount: calculateTotal(items)
  });
  await orderRepo.save(newOrder);

  // 2. Try to reserve inventory for each item
  const inventoryRepo = inventoryDataSource.getRepository(Product);
  for (const item of items) {
    // Use a conditional update to check stock
    const result = await inventoryRepo
      .createQueryBuilder()
      .update(Product)
      .set({
        reserved: () => `reserved + ${item.quantity}`, // Increment reserved count
        quantity: () => `quantity - ${item.quantity}`  // Decrement available count
      })
      .where("id = :id AND quantity >= :quantity", { id: item.productId, quantity: item.quantity })
      .execute();

    if (result.affected === 0) {
      throw new Error(`Insufficient stock for product ${item.productId}`);
    }
  }

  // 3. Process payment (simplified)
  const paymentRepo = paymentsDataSource.getRepository(Payment);
  const payment = paymentRepo.create({
    orderId: transactionId,
    amount: newOrder.totalAmount,
    status: 'PENDING'
  });
  await paymentRepo.save(payment);

  // Simulate payment gateway call
  const paymentSuccess = await chargePayment(payment.id, newOrder.totalAmount);
  if (!paymentSuccess) {
    throw new Error('Payment failed');
  }
  await paymentRepo.update(payment.id, { status: 'CAPTURED' });

  // 4. Finalize the order
  await orderRepo.update(transactionId, { status: 'CONFIRMED' });
  console.log(`Order ${transactionId} completed successfully.`);
}

Notice this isn’t a single atomic transaction. If the payment fails at step 3, the order is left as PENDING and the inventory is still reserved. We need a cleanup process—a Saga compensator or a periodic job—to find these stuck PENDING orders, release the reserved inventory, and maybe mark the order as FAILED. This is the reality of building robust distributed systems.

The key takeaway is this: you cannot have the same guarantees across multiple databases as you can within one. You must design for failure. Your system should assume that any step can fail and have a plan to recover. This shifts the focus from preventing inconsistency to managing it gracefully.

I find that starting with a simple, well-logged process is better than a complex, “perfect” one that is hard to debug. Log every step, every decision, and every error with that global transactionId. When a user calls support, you can find their saga in the logs and understand exactly where it stopped.

This topic is vast, and we’ve only scratched the surface. There are message queues, event sourcing, and change data capture patterns that all play a role. The goal is always the same: to keep your system’s view of the world as coherent as possible, even when the parts are far apart.

I hope this gives you a practical starting point. It’s a challenging but fascinating part of software architecture. What kind of consistency challenges are you facing in your projects?

If you found this walk-through helpful, please share it with a colleague who might be wrestling with similar issues. I’d love to hear about your experiences and solutions in the comments below. Let’s keep the conversation going.


As a best-selling author, I invite you to explore my books on Amazon. Don’t forget to follow me on Medium and show your support. Thank you! Your support means the world!


101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!


📘 Checkout my latest ebook for free on my channel!
Be sure to like, share, comment, and subscribe to the channel!


Our Creations

Be sure to check out our creations:

Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | JS Schools


We are on Medium

Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva

Keywords: distributed systems, data consistency, saga pattern, two-phase commit, microservices



Similar Posts
Blog Image
Why gRPC with NestJS Is the Future of Fast, Reliable Microservices

Discover how gRPC and NestJS simplify service communication with high performance, type safety, and real-time streaming support.

Blog Image
Complete Guide to Next.js Prisma Integration: Build Type-Safe Database-Driven Apps in 2024

Learn to integrate Next.js with Prisma ORM for type-safe, full-stack applications. Build powerful database-driven apps with seamless frontend-backend integration.

Blog Image
How to Combine TypeScript and Joi for Safer, Validated APIs

Learn how to unify TypeScript types and Joi validation to build robust, error-resistant APIs with confidence and clarity.

Blog Image
Complete Guide to Integrating Nest.js with Prisma ORM for Type-Safe Database Operations

Learn how to integrate Nest.js with Prisma ORM for type-safe database operations and scalable backend APIs. Complete setup guide with best practices.

Blog Image
Complete Guide to Next.js Prisma Integration: Build Type-Safe Full-Stack Applications in 2024

Learn how to integrate Next.js with Prisma ORM for type-safe, full-stack applications. Build scalable database-driven apps with seamless data management.

Blog Image
How to Build a Scalable Serverless GraphQL API with AWS AppSync

Learn how to create a powerful, serverless GraphQL API using AWS AppSync, DynamoDB, and Lambda—no server management required.