js

How to Build Production-Ready PDFs with Puppeteer, PDFKit, and pdf-lib

Learn how to generate fast, reliable PDFs in Node.js using Puppeteer, PDFKit, and pdf-lib with real-world, production-ready tips.

How to Build Production-Ready PDFs with Puppeteer, PDFKit, and pdf-lib

I was building a reporting feature for a client last month when I hit a wall. The PDFs looked terrible. Tables broke across pages, fonts were wrong, and the whole process was slow. That’s when I realized most tutorials only scratch the surface. They don’t show you how to build something that actually works in production. So I spent weeks researching, testing, and building. What follows is everything I wish I knew before starting.

Let’s talk about making PDFs that don’t break when you need them most.

First, you need to pick the right tool. There are two main approaches, and choosing wrong will cost you time and headaches.

Puppeteer is your best friend when you have complex layouts. It’s essentially a headless browser that converts HTML to PDF. This means you can use all your CSS skills. Want a fancy invoice with gradients, shadows, and perfect alignment? Puppeteer handles it.

But what if you’re generating thousands of simple receipts? Launching a browser for each one is overkill. That’s where PDFKit shines. It creates PDFs programmatically. You tell it exactly where to put each line of text, each image. It’s fast and uses little memory.

Then there’s pdf-lib. This isn’t for creating PDFs from scratch. It’s for working with existing ones. Need to merge ten reports into one document? Add a watermark to a contract? Fill out a form? That’s pdf-lib’s job.

Think about your use case. Are you building a dashboard that exports beautiful charts? Use Puppeteer. Are you printing shipping labels at high volume? Use PDFKit. Are you processing uploaded documents? Use pdf-lib.

Here’s a practical setup to get you started. I prefer organizing by function, not by technology.

mkdir pdf-service && cd pdf-service
npm init -y
npm install express puppeteer pdfkit pdf-lib
npm install -D typescript @types/node

Create a simple structure from the beginning. It saves refactoring later.

src/
├── generators/    # Creates new PDFs
├── manipulators/  # Edits existing PDFs
├── storage/       # Saves and retrieves files
└── api/           # Handles web requests

Let’s build a real Puppeteer generator. Most examples launch a new browser for every PDF. In production, this will crash your server. You need to manage browser instances carefully.

// src/generators/puppeteer.ts
import puppeteer from 'puppeteer';

class PDFGenerator {
  private browser: puppeteer.Browser | null = null;
  private maxConcurrent = 2;
  private activeCount = 0;
  private queue: Array<() => void> = [];

  async initialize() {
    if (!this.browser) {
      this.browser = await puppeteer.launch({
        headless: 'new',
        args: ['--no-sandbox', '--disable-setuid-sandbox']
      });
    }
  }

  private async waitForSlot() {
    if (this.activeCount < this.maxConcurrent) {
      this.activeCount++;
      return;
    }
    
    await new Promise<void>((resolve) => {
      this.queue.push(resolve);
    });
  }

  private releaseSlot() {
    this.activeCount--;
    const next = this.queue.shift();
    if (next) next();
  }

  async generate(html: string, options = {}) {
    await this.waitForSlot();
    
    try {
      const page = await this.browser!.newPage();
      await page.setContent(html, { waitUntil: 'networkidle0' });
      
      const pdf = await page.pdf({
        format: 'A4',
        printBackground: true,
        margin: { top: '1cm', right: '1cm', bottom: '1cm', left: '1cm' }
      });
      
      await page.close();
      return pdf;
    } finally {
      this.releaseSlot();
    }
  }
}

Notice the queue system? It prevents memory overload by limiting how many PDFs generate at once. The finally block ensures slots always get released, even if something fails.

Have you ever wondered why some PDFs load instantly while others take forever? The secret is streaming. Don’t wait for the entire PDF to generate before sending it to the user.

// src/api/pdf-routes.ts
import express from 'express';
import { PDFGenerator } from '../generators/puppeteer';

const router = express.Router();
const generator = new PDFGenerator();

router.get('/report', async (req, res) => {
  // Start generating immediately
  const pdfPromise = generator.generate('<h1>Your Report</h1>');
  
  // Set headers right away
  res.setHeader('Content-Type', 'application/pdf');
  res.setHeader('Content-Disposition', 'attachment; filename=report.pdf');
  
  // Stream the PDF as it generates
  const pdfBuffer = await pdfPromise;
  res.send(pdfBuffer);
});

But what about dynamic data? You can’t just write HTML strings. You need templates.

// src/generators/template-engine.ts
function renderInvoice(invoiceData: any): string {
  return `
    <!DOCTYPE html>
    <html>
    <head>
      <style>
        body { font-family: Arial, sans-serif; }
        .total { font-weight: bold; color: #2c3e50; }
        .page-break { page-break-after: always; }
      </style>
    </head>
    <body>
      <h1>Invoice #${invoiceData.number}</h1>
      ${invoiceData.items.map(item => `
        <div class="item">
          ${item.description}: $${item.amount}
        </div>
      `).join('')}
      <div class="total">
        Total: $${invoiceData.total}
      </div>
    </body>
    </html>
  `;
}

Now let’s look at PDFKit for simpler documents. The syntax is different but powerful.

// src/generators/pdfkit-generator.ts
import PDFDocument from 'pdfkit';

function createReceipt(order: any): Buffer {
  return new Promise((resolve) => {
    const chunks: Buffer[] = [];
    const doc = new PDFDocument({ size: 'A7' }); // Small receipt size
    
    doc.on('data', chunks.push.bind(chunks));
    doc.on('end', () => resolve(Buffer.concat(chunks)));
    
    // Add content
    doc.fontSize(20).text('Receipt', { align: 'center' });
    doc.moveDown();
    doc.fontSize(12).text(`Order: ${order.id}`);
    doc.text(`Date: ${new Date().toLocaleDateString()}`);
    doc.moveDown();
    
    order.items.forEach((item: any) => {
      doc.text(`${item.name} x${item.quantity}: $${item.price}`);
    });
    
    doc.moveDown();
    doc.fontSize(14).text(`Total: $${order.total}`, { align: 'right' });
    
    doc.end();
  });
}

See how we work with buffers and streams? This is crucial for performance. The PDF generates in chunks, not all at once.

What happens when you need to combine approaches? Maybe you have a cover page made with PDFKit and a report body from Puppeteer. That’s where pdf-lib comes in.

// src/manipulators/merger.ts
import { PDFDocument } from 'pdf-lib';

async function mergePDFs(pdfBuffers: Buffer[]): Promise<Buffer> {
  const mergedPdf = await PDFDocument.create();
  
  for (const buffer of pdfBuffers) {
    const pdf = await PDFDocument.load(buffer);
    const pages = await mergedPdf.copyPages(pdf, pdf.getPageIndices());
    pages.forEach(page => mergedPdf.addPage(page));
  }
  
  return Buffer.from(await mergedPdf.save());
}

Memory management is where most projects fail. Generating PDFs uses a lot of it. You need to clean up properly.

// src/generators/safe-generator.ts
async function generateWithCleanup(html: string) {
  let page = null;
  
  try {
    const browser = await puppeteer.launch();
    page = await browser.newPage();
    
    // Set a memory limit
    await page.setCacheEnabled(false);
    
    const pdf = await page.pdf({ /* options */ });
    return pdf;
    
  } catch (error) {
    console.error('Generation failed:', error);
    throw error;
    
  } finally {
    // Always clean up, even on error
    if (page) await page.close();
    if (browser) await browser.close();
  }
}

For high-volume systems, consider a job queue. Don’t make users wait for PDF generation. Generate in the background and notify when ready.

// src/queue/pdf-worker.ts
import { Worker } from 'bullmq';
import { PDFGenerator } from '../generators/puppeteer';

const worker = new Worker('pdf-generation', async (job) => {
  const { html, userId, documentId } = job.data;
  const generator = new PDFGenerator();
  
  const pdf = await generator.generate(html);
  
  // Save to cloud storage
  await saveToStorage(documentId, pdf);
  
  // Notify user
  await sendNotification(userId, `Your PDF ${documentId} is ready`);
});

Security matters too. Always validate input, even for internal systems.

// src/api/validators.ts
import { z } from 'zod';

const pdfRequestSchema = z.object({
  html: z.string().max(100000), // Limit size
  format: z.enum(['A4', 'Letter', 'Legal']).default('A4'),
  watermark: z.string().optional()
});

function validateRequest(data: unknown) {
  try {
    return pdfRequestSchema.parse(data);
  } catch (error) {
    throw new Error('Invalid PDF request');
  }
}

Testing PDFs is tricky. You can’t just check if a file exists. You need to verify the content.

// tests/pdf.test.ts
import { PDFDocument } from 'pdf-lib';

test('generates correct invoice total', async () => {
  const pdfBuffer = await generateInvoice(testData);
  const pdf = await PDFDocument.load(pdfBuffer);
  const pages = pdf.getPages();
  const text = await pages[0].getText();
  
  expect(text).toContain(`Total: $${testData.total}`);
});

Common problems? Fonts not embedding is a big one. With Puppeteer, ensure fonts are loaded in your HTML. With PDFKit, you might need to register font files.

// Registering fonts with PDFKit
const doc = new PDFDocument();
doc.registerFont('CustomFont', './fonts/MyFont.ttf');
doc.font('CustomFont').text('Styled text');

Another issue: images making PDFs huge. Always compress them first.

// src/utils/image-helper.ts
import sharp from 'sharp';

async function prepareImageForPDF(buffer: Buffer): Promise<Buffer> {
  return sharp(buffer)
    .resize(1200, 1200, { fit: 'inside' }) // Limit size
    .jpeg({ quality: 80 }) // Compress
    .toBuffer();
}

What about pagination? Getting page breaks right is an art.

<!-- In your HTML templates -->
<style>
  .avoid-break {
    break-inside: avoid;
  }
  
  .force-break {
    page-break-before: always;
  }
  
  table {
    break-inside: auto;
  }
  
  tr {
    break-inside: avoid;
    break-after: auto;
  }
</style>

The most important lesson? Start simple. Build a basic generator first. Make it work perfectly. Then add features one by one. Don’t try to build the perfect system on day one.

I made that mistake. I spent days designing a complex architecture before writing a single line of PDF code. Then I discovered my main use case needed a completely different approach. Build, test, learn, adjust.

Remember that tools are just tools. The real skill is knowing when to use each one. Sometimes you need Puppeteer’s power. Sometimes PDFKit’s speed. Sometimes you need both in the same document.

Keep your code modular. Your Puppeteer generator shouldn’t know about your storage system. Your API shouldn’t know how PDFs are made. This lets you switch technologies later without rewriting everything.

Monitor everything. Track how long PDFs take to generate. Watch memory usage. Log failures. In production, you’ll discover edge cases you never imagined. A particular HTML structure might crash Puppeteer. A specific font might break PDFKit. You need to know when these happen.

Finally, think about the user. A PDF is usually the end of a process. Someone needs to print it, sign it, or submit it. Make that easy. Generate at the right size. Use web-friendly fonts. Keep file sizes reasonable. Add bookmarks to long documents.

This is what separates working code from production-ready systems. It’s not about fancy features. It’s about reliability, performance, and meeting real needs.

What challenges have you faced with PDF generation? Have you found solutions I haven’t mentioned here? Share your experiences in the comments below. If this guide helped you, please like and share it with other developers who might be struggling with the same problems. Let’s build better tools together.


As a best-selling author, I invite you to explore my books on Amazon. Don’t forget to follow me on Medium and show your support. Thank you! Your support means the world!


101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!


📘 Checkout my latest ebook for free on my channel!
Be sure to like, share, comment, and subscribe to the channel!


Our Creations

Be sure to check out our creations:

Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | JS Schools


We are on Medium

Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva

Keywords: pdf generation,puppeteer,pdfkit,nodejs,production ready



Similar Posts
Blog Image
Build Distributed Task Queue: BullMQ, Redis, TypeScript Guide for Scalable Background Jobs

Learn to build robust distributed task queues with BullMQ, Redis & TypeScript. Handle job priorities, retries, scaling & monitoring for production systems.

Blog Image
How to Build a Production-Ready GraphQL API with NestJS, Prisma, and Redis: Complete Guide

Learn to build a production-ready GraphQL API using NestJS, Prisma & Redis caching. Complete guide with authentication, optimization & deployment tips.

Blog Image
Build High-Performance GraphQL APIs: Complete TypeScript, Prisma & Apollo Server Development Guide

Learn to build high-performance GraphQL APIs with TypeScript, Prisma & Apollo Server. Master schema-first development, optimization & production deployment.

Blog Image
How to Mock API Requests in Jest Using Mock Service Worker (MSW)

Learn how to write reliable frontend tests by intercepting real network requests with Jest and Mock Service Worker (MSW).

Blog Image
Build Production-Ready GraphQL API with NestJS, Prisma, Redis: Complete Performance Guide

Learn to build a scalable GraphQL API with NestJS, Prisma ORM, and Redis caching. Master resolvers, authentication, and production optimization techniques.

Blog Image
How to Prevent CSRF Attacks in Express.js Using JWT and Secure Tokens

Learn how to protect your Express.js apps from CSRF attacks using JWT, Double-Submit Cookies, and Synchronizer Tokens.