I was debugging a production issue last week. A user reported that their checkout was taking over 30 seconds. I had logs from the API gateway, the order service, and the database. Each log file said everything was fine. Yet, the user was stuck waiting. I couldn’t see the forest for the trees. That moment made me realize: in a world of microservices, logs are not enough. We need a map. We need to see the entire journey of a single request as it hops between services. This is why I want to talk about building a map for your distributed system using distributed tracing. Let’s build that map together.
Think of your application as a busy city. Logs are like individual security camera feeds from different streets. They show activity, but they don’t show you how a single car got from the highway to a specific parking garage. Distributed tracing gives you that car’s entire GPS route. It connects the dots. For anyone building with microservices, serverless functions, or any system where work is split up, this visibility is not a luxury; it’s a necessity for sanity and performance.
So, how do we start drawing this map? We need a standard way to instrument our code, a place to store all the route data, and a way to visualize it. This is where a stack like OpenTelemetry, Grafana Tempo, and Grafana comes in. And I’m choosing Bun as our runtime because its speed means adding this observability layer has a minimal impact on our application’s performance. Less overhead for more insight is always a good trade.
Let’s set up our observability backend first. We’ll use Docker Compose to run Grafana Tempo for storing traces and Grafana for viewing them. Why start here? Because it’s easier to send data to a system that’s already waiting. Create a docker-compose.yml file. We’ll define services for Tempo and Grafana. Tempo will listen for trace data on port 4318, and Grafana will run on port 3100.
# docker-compose.yml
version: '3.8'
services:
tempo:
image: grafana/tempo:latest
command: ["-config.file=/etc/tempo.yaml"]
volumes:
- ./tempo.yaml:/etc/tempo.yaml
ports:
- "4318:4318" # For receiving trace data
- "3200:3200" # For querying traces
grafana:
image: grafana/grafana:latest
ports:
- "3100:3000"
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
Next, we need a simple configuration file for Tempo to tell it how to store the data. Create a file called tempo.yaml. This basic setup uses local storage, which is fine for our tutorial.
# tempo.yaml
server:
http_listen_port: 3200
distributor:
receivers:
otlp:
protocols:
http:
storage:
trace:
backend: local
local:
path: /tmp/tempo/traces
Run docker-compose up -d in your terminal. In a moment, you’ll have Tempo and Grafana running. You can open Grafana at http://localhost:3100. The default login is admin for both username and password. We’ll configure it to talk to Tempo later. Now, we have a destination for our traces.
With our storage ready, let’s instrument a Bun application. Create a new project with bun init. We’ll need the OpenTelemetry packages. OpenTelemetry is the standard for generating and collecting telemetry data like traces and metrics. Install the necessary packages.
bun add @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node @opentelemetry/exporter-trace-otlp-http
Now, let’s create the core setup. The key file will initialize the OpenTelemetry Node SDK. This SDK will automatically instrument common modules like http and fetch. Create a file named tracing.js.
// tracing.js
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
const sdk = new NodeSDK({
resource: new Resource({
[SemanticResourceAttributes.SERVICE_NAME]: 'my-bun-service',
}),
traceExporter: new OTLPTraceExporter({
url: 'http://localhost:4318/v1/traces',
}),
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();
console.log('Tracing initialized');
How do we use this? We need to run this setup code before our application starts. The easiest way is to use the --require flag with Bun. Create a simple HTTP server in index.js.
// index.js
import { serve } from 'bun';
const server = serve({
port: 3000,
async fetch(request) {
// Simulate some work
await Bun.sleep(50);
const response = await fetch('https://api.github.com');
await Bun.sleep(30);
return new Response('Hello, traced world!');
},
});
console.log(`Server running on ${server.url}`);
Now, run your application with the tracing initialization required first.
bun --require ./tracing.js index.js
Make a request to http://localhost:3000. What just happened? The OpenTelemetry instrumentation automatically created spans for the incoming HTTP request and the outgoing fetch call to GitHub’s API. It packaged those spans into a trace and sent it to our Tempo backend. You didn’t have to write any manual tracing code yet. Isn’t it powerful when the tools do the heavy lifting for you?
But automatic instrumentation only gets you so far. To truly understand your business logic, you need custom spans. Let’s say we have a function that processes an order. We want to see how long it takes and what happens inside. We need to use the OpenTelemetry API. First, install it: bun add @opentelemetry/api. Now, let’s modify our server.
// index.js with custom tracing
import { serve } from 'bun';
import { trace } from '@opentelemetry/api';
const tracer = trace.getTracer('order-processor');
async function processOrder(orderId) {
// Start a custom span for this function
return tracer.startActiveSpan('processOrder', async (span) => {
try {
span.setAttribute('order.id', orderId);
// Simulate checking inventory
await Bun.sleep(100);
span.addEvent('inventory_checked');
// Simulate charging a payment method
await Bun.sleep(200);
span.addEvent('payment_processed');
return { success: true, orderId };
} finally {
span.end(); // Remember to end the span!
}
});
}
serve({
port: 3000,
async fetch(request) {
const url = new URL(request.url);
if (url.pathname === '/order') {
const result = await processOrder('12345');
return Response.json(result);
}
return new Response('Not found', { status: 404 });
},
});
Now, if you hit http://localhost:3000/order, you’ll get a trace that includes your custom processOrder span with its attributes and events. This span will be a child of the automatic HTTP span. You can see the exact timing of the inventory check and payment processing. Can you see how this turns a vague “order endpoint is slow” into a precise “the payment processing step inside the order function is taking 200ms”?
The real magic of distributed tracing happens when a trace flows across service boundaries. We need to pass the trace context from one service to another. OpenTelemetry handles this through context propagation. When service A calls service B, it injects trace headers into the HTTP request. Service B extracts those headers and continues the same trace. Let’s create a second, separate service that our first one calls.
Create a new file for a “payment service,” payment.js. Run it on port 3001.
// payment.js
import { serve } from 'bun';
import { trace } from '@opentelemetry/api';
import { W3CTraceContextPropagator } from '@opentelemetry/core';
import { context, propagation } from '@opentelemetry/api';
const tracer = trace.getTracer('payment-service');
const propagator = new W3CTraceContextPropagator();
serve({
port: 3001,
async fetch(request) {
// Extract the trace context from the incoming request headers
const carrier = {};
for (const [key, value] of request.headers) {
carrier[key] = value;
}
const ctx = propagator.extract(context.active(), carrier);
// Run the handler within the extracted context
return context.with(ctx, async () => {
return tracer.startActiveSpan('charge-payment', async (span) => {
await Bun.sleep(150); // Simulate work
span.end();
return new Response(JSON.stringify({ charged: true }), {
headers: { 'Content-Type': 'application/json' },
});
});
});
},
});
Now, update the main service to call this payment service and propagate the context.
// In the main index.js, inside the processOrder function
async function chargePayment() {
const span = trace.getActiveSpan(); // Get the current active span
const carrier = {};
// Inject the current trace context into a headers object
propagation.inject(trace.setSpan(context.active(), span), carrier);
const response = await fetch('http://localhost:3001', {
headers: carrier, // Send the trace headers
});
return response.json();
}
Start both services (don’t forget to run each with --require ./tracing.js). When you call the order endpoint, the trace will now include spans from both services, linked together as a single journey. You’ve just connected two independent processes on the map. How much easier does debugging become when you can follow a request from the gateway all the way to the database?
Finally, let’s look at the map. Go to Grafana at http://localhost:3100. You need to add Tempo as a data source. Click the gear icon (Configuration) -> Data Sources -> Add data source. Choose “Tempo”. For the URL, enter http://tempo:3200. Save and test. Now, go to the “Explore” page (the compass icon). Select the Tempo data source. You should see a query field. Since we’ve been sending traces, you can search for them. Try searching for the service name my-bun-service. Click on a trace result.
You will see a timeline view, often called a “trace waterfall.” It shows all the spans in a hierarchical view. You can see how long each span took, which one is the parent, and which are children. Click on a span to see its attributes, like order.id. This is your map. This visualization turns complex timing data into an intuitive story. Can you spot the bottleneck?
This is just the beginning. You can add metrics, logs, and correlate them all with traces. You can set up sampling rules in production to collect only 10% of traces to manage cost, but always sample traces with errors. The goal is to build a system that is observable, not just monitored. You move from asking “is it broken?” to “why is it slow?” and finally to “how can I make it faster?”
I built this guide because I spent hours in the dark with disconnected logs. I don’t want you to do the same. Start with a simple setup like this. Instrument one service. See the trace. Feel the clarity it brings. Then expand. The complexity of our systems will only grow, but with the right observability pipeline, our understanding can grow with it. If this guide helped you see the path forward, please share it with a teammate who’s also navigating a microservice maze. Have you tried setting up tracing before? What was your biggest challenge? Let me know in the comments.
As a best-selling author, I invite you to explore my books on Amazon. Don’t forget to follow me on Medium and show your support. Thank you! Your support means the world!
101 Books
101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.
Check out our book Golang Clean Code available on Amazon.
Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!
📘 Checkout my latest ebook for free on my channel!
Be sure to like, share, comment, and subscribe to the channel!
Our Creations
Be sure to check out our creations:
Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | JS Schools
We are on Medium
Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva