Over the past month, I’ve been wrestling with performance bottlenecks in our company’s GraphQL API. As our user base grew, we started seeing increased latency in nested queries across multiple services. That’s when I decided to explore Apollo Federation with Redis caching to create a unified, high-performance gateway. Let me share what I’ve learned.
First, why Apollo Federation? It solves a critical problem in distributed systems. Instead of having one monolithic GraphQL server, we break our architecture into independent subgraphs. Each service owns specific data domains. The gateway stitches these together into a single schema. Think about it – how often have you seen APIs become tangled when multiple teams work on the same codebase? Federation prevents that.
Setting up requires careful planning. Here’s how I structured the environment:
# Create core directories
mkdir -p gateway/{src,config} users-service/{src,models} products-service/src
For dependencies, focus on these essentials:
// gateway/package.json
"dependencies": {
"@apollo/gateway": "^2.5.1",
"apollo-server": "^3.12.1",
"redis": "^4.6.7",
"apollo-server-plugin-response-cache": "^3.8.2"
}
Now, let’s build a Users subgraph. Notice how we extend types and reference external entities:
// users-service/src/schema.ts
const typeDefs = gql`
type User @key(fields: "id") {
id: ID!
email: String!
favoriteProducts: [Product!]!
}
type Product @extends @key(fields: "id") {
id: ID! @external
}
type Query {
getUser(id: ID!): User
}
`;
const resolvers = {
User: {
favoriteProducts(user) {
return user.favoriteIds.map(id => ({ __typename: "Product", id }));
}
}
};
The magic happens at the gateway level. Here’s how I configured it to compose schemas from multiple services:
// gateway/src/index.ts
import { ApolloGateway } from '@apollo/gateway';
import { ApolloServer } from 'apollo-server';
const gateway = new ApolloGateway({
supergraphSdl: new IntrospectAndCompose({
subgraphs: [
{ name: 'users', url: 'http://localhost:4001/graphql' },
{ name: 'products', url: 'http://localhost:4002/graphql' }
]
})
});
const server = new ApolloServer({ gateway });
server.listen(4000);
But what about performance? That’s where Redis enters the picture. I implemented two caching layers:
- Query caching: Stores entire GraphQL responses
- Entity caching: Caches individual domain objects
// gateway/src/redis-cache.ts
import Redis from 'redis';
const redisClient = Redis.createClient();
async function cacheResponse(key: string, data: any, ttl = 60) {
await redisClient.setex(key, ttl, JSON.stringify(data));
}
// Usage in resolver
const userResolver = async (parent, args, context) => {
const cacheKey = `user:${args.id}`;
const cached = await redisClient.get(cacheKey);
if (cached) return JSON.parse(cached);
const data = await fetchUser(args.id);
await cacheResponse(cacheKey, data);
return data;
};
Authentication presented an interesting challenge. How do you handle stateless auth across multiple services? I used JWT with a shared secret. The gateway validates tokens and attaches user context to requests:
// gateway/src/auth.ts
const gateway = new ApolloGateway({
// ...config,
buildService({ url }) {
return new RemoteGraphQLDataSource({
url,
willSendRequest({ request, context }) {
request.http.headers.set('user', context.user ? JSON.stringify(context.user) : null);
}
});
}
});
For performance tuning, I recommend these strategies:
- Use DataLoader to batch requests
- Set cache TTL based on data volatility
- Monitor query complexity
- Enable persisted queries
Here’s a DataLoader implementation I used:
// products-service/src/dataloaders.ts
import DataLoader from 'dataloader';
const productLoader = new DataLoader(async (ids) => {
const products = await ProductModel.find({ _id: { $in: ids } });
return ids.map(id => products.find(p => p.id === id));
});
In testing, I discovered several gotchas:
- Always mock external services in integration tests
- Test schema stitching edge cases
- Validate cache invalidation logic
- Simulate network failures
When deploying to production:
- Use Redis Cluster for high availability
- Implement circuit breakers for subgraphs
- Set up Apollo Studio for monitoring
- Enable query logging
One question I often get: How do you handle cache invalidation when data changes? My solution combines Redis pub/sub with cache key versioning. When a user updates their profile, we publish an invalidation event:
// users-service/src/updateProfile.ts
redis.publisher.publish('cache-invalidate', `user:${userId}`);
This approach reduced our average query latency from 450ms to 89ms. The true power emerges when all components work together - federated services reduce complexity, while Redis ensures lightning-fast responses.
If you’ve faced similar challenges with distributed GraphQL systems, I’d love to hear your solutions. What caching strategies have worked for you? Share your experiences in the comments below - and if this helped you, consider sharing it with your network.