I was building a collaborative document editor for my team. We needed multiple people to edit the same text at the same time, without overwriting each other’s work. The challenge was clear: how do you make sure everyone sees the same thing when they’re all typing at once? This is where Operational Transform comes in. It’s the technology behind tools like Google Docs. Let’s build one from the ground up.
Think about two people typing in the same sentence. Person A adds a word at the beginning. Person B deletes a word from the end. How does the system decide what the final text should be? This is the puzzle we need to solve.
We’ll use Node.js for the server, WebSockets for instant communication, and MongoDB to keep track of everything. The goal is to create a system that feels seamless, as if you’re the only one editing, even when you’re not.
First, we need to understand what an “operation” is. In our world, an operation is a single change. It could be “insert the letter ‘a’ at position 5” or “delete 3 characters starting at position 10.” The server’s job is to take these operations from different users and merge them into one true history.
Why does the server need to be in charge? Couldn’t each computer just figure it out? Having a central authority simplifies things. It ensures there’s one single, correct version of the document. Everyone agrees to let the server have the final say.
Let’s set up our project. Create a new folder and run npm init. We’ll need a few key packages.
{
"dependencies": {
"express": "^4.18.2",
"ws": "^8.14.2",
"mongodb": "^6.3.0",
"uuid": "^9.0.1"
}
}
Express will handle basic web server duties. The ws library manages WebSocket connections for real-time updates. MongoDB will store our documents and their change history. UUID helps create unique IDs for users and documents.
Now, let’s define what an operation looks like in code. We need a clear structure.
// An operation can be one of three things
const operationTypes = {
INSERT: 'insert',
DELETE: 'delete',
RETAIN: 'retain'
};
// Example: Insert "hello" at the start of the document
const insertOp = {
type: operationTypes.INSERT,
position: 0,
text: 'hello'
};
// Example: Delete 5 characters from position 10
const deleteOp = {
type: operationTypes.DELETE,
position: 10,
length: 5
};
The retain operation is special. It doesn’t change text. It means “skip over this many characters.” You’ll see why this is useful when we combine operations.
The core of our system is the transformation function. This function takes two operations that happened at the same time and adjusts them so they can be applied one after the other without causing a mess.
Imagine the document says “cat.” User A changes it to “cats.” User B changes it to “bat.” Both start from “cat.” A inserts an ‘s’ at position 3. B changes the ‘c’ to a ‘b’ at position 0. The transformation function makes these changes compatible.
Here is a simplified version of that logic.
function transformOperation(op1, op2) {
// If op1 is an insert before op2's position, op2 must move over.
if (op1.type === 'insert' && op1.position <= op2.position) {
return {
...op2,
position: op2.position + op1.text.length
};
}
// If op1 deletes text before op2, op2's position shifts back.
if (op1.type === 'delete') {
const deleteEnd = op1.position + op1.length;
if (deleteEnd <= op2.position) {
return {
...op2,
position: op2.position - op1.length
};
}
}
// Otherwise, op2 stays the same.
return op2;
}
This is a basic example. A full system handles many more edge cases, like overlapping deletions.
Next, we build the WebSocket server. This is the nervous system of our application. It listens for operations from clients, transforms them, and broadcasts the results.
const WebSocket = require('ws');
const wss = new WebSocket.Server({ port: 8080 });
const connectedClients = new Map();
wss.on('connection', (ws, request) => {
const clientId = generateUniqueId();
connectedClients.set(clientId, ws);
ws.on('message', (message) => {
const data = JSON.parse(message);
// 1. Validate the operation.
// 2. Fetch the current document state from MongoDB.
// 3. Transform the new operation against any pending ones.
// 4. Apply the transformed operation to the document.
// 5. Save the new state and operation to MongoDB.
// 6. Broadcast the transformed operation to all other clients.
});
ws.on('close', () => {
connectedClients.delete(clientId);
});
});
Every message from a client is an operation. The server must process it in the context of what has already happened. It keeps a revision number for the document. Each successful operation increases the revision.
How do we store this in MongoDB? We need two main collections: one for documents and one for the operation log.
// Document collection schema
{
_id: 'doc_123',
content: 'The current text of the document.',
revision: 42, // The number of operations applied
createdAt: ISODate("2024-01-01"),
updatedAt: ISODate("2024-01-15")
}
// Operations log schema
{
_id: 'op_456',
documentId: 'doc_123',
revision: 42,
operation: { type: 'insert', position: 5, text: 'new ' },
clientId: 'user_789',
timestamp: ISODate("2024-01-15T10:30:00Z")
}
The operation log is crucial. It lets us replay all changes from the beginning to rebuild the document. It also lets us undo changes or see who wrote what.
Now, let’s look at the client side. A user opens our editor in their browser. The client connects via WebSocket and downloads the current document. Then it sets up a local text area.
Every time the user types or deletes, the client doesn’t send the whole document. That would be wasteful. Instead, it calculates the minimal operation and sends that to the server.
// Simple client-side logic for handling a keypress
let lastDocumentState = 'Start text';
let localRevision = 0;
textArea.addEventListener('input', (event) => {
const newState = event.target.value;
// A real library would calculate the diff intelligently.
// Here's a naive example: find the first difference.
const op = calculateDiffOperation(lastDocumentState, newState);
// Send the operation to the server
websocket.send(JSON.stringify({
type: 'operation',
operation: op,
revision: localRevision
}));
// Update local state
lastDocumentState = newState;
});
The client also listens for messages from the server. When it receives a transformed operation from another user, it applies it to its own local text area. This keeps everyone in sync.
What about showing where other people are typing? This is called “presence.” We can send cursor position updates.
// When a user moves their cursor, send an update
textArea.addEventListener('cursorchange', (event) => {
websocket.send(JSON.stringify({
type: 'cursor',
position: event.cursorPosition,
clientId: myClientId
}));
});
The server broadcasts these cursor positions to all other clients. Each client can then draw a little colored caret or selection highlight for each remote user.
Testing this system is vital. We need to simulate many users typing at once. We can write a script that acts like multiple clients.
// A simple test script
const simulateUser = (userId) => {
const ws = new WebSocket('ws://localhost:8080');
ws.onopen = () => {
// Join a document
ws.send(JSON.stringify({ type: 'join', docId: 'test' }));
// Send random operations every second
setInterval(() => {
const op = generateRandomOperation();
ws.send(JSON.stringify({ type: 'operation', operation: op }));
}, 1000);
};
};
// Run 10 simulated users
for (let i = 0; i < 10; i++) simulateUser(i);
Watch the document change. Does it stay consistent? Do characters get lost? This kind of stress test helps find bugs in our transformation logic.
As more users join, our server can become a bottleneck. How can we scale? We might need to shard documents across different server instances or use a faster in-memory database like Redis for the most recent operations, while keeping the full history in MongoDB.
One common problem is handling slow clients. What if a user’s network is bad and their operations arrive at the server very late? The server must reject operations that are based on an old document revision. It tells the client to catch up by requesting the latest document state before trying again.
Building this has been a fascinating journey into the mechanics of collaboration. It turns a complex problem of timing and conflict into something that feels simple and magical for the end user.
The principles here don’t just apply to text. You could use similar ideas for collaborative drawing, spreadsheet formulas, or even code editing. The core concept is the same: define clear operations, transform them centrally, and broadcast the results.
I encourage you to try building this yourself. Start with a simple server that handles one document and two clients. Get the basic transformation working. Then add persistence, presence, and scaling. What edge cases can you find?
If you found this walk-through helpful, please share it with other developers who might be curious about how real-time collaboration works. Have you tried building something similar? What challenges did you face? Let me know in the comments.
As a best-selling author, I invite you to explore my books on Amazon. Don’t forget to follow me on Medium and show your support. Thank you! Your support means the world!
101 Books
101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.
Check out our book Golang Clean Code available on Amazon.
Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!
📘 Checkout my latest ebook for free on my channel!
Be sure to like, share, comment, and subscribe to the channel!
Our Creations
Be sure to check out our creations:
Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | JS Schools
We are on Medium
Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva