How to Merge Multiple PDFs into One using Python?


In today’s post, we’ll be learning how to merge multiple PDFs into one using a popular language, Python. Python has many modules in its standard library that makes it very versatile. In this tutorial, we will be using two of these modules, PyPDF2, and os.

Installing PyPDF2

Before we get started, we need to install the PyPDF2 library. We can do this using pip, which is a package manager for Python libraries. If you don’t already have pip installed, follow the instructions here.

Once you have pip installed, open up a Terminal window and type the following command:

$pip install pypdf2
Code language: PHP (php)

This will install the PyPDF2 library on your computer.

Once the library is installed, you can use it to combine your PDFs into a single document. First, you need to import PyPDF2 library to your script.

import os from PyPDF2 import PdfFileReader, PdfFileWriter
Code language: JavaScript (javascript)

Merging PDF Files with PyPDF2

Now we can start writing our script. We’ll start by opening up a new file in our favorite code editor

Next, we’ll need to open the PDF files that we want to merge. We’ll store each file in its own variable so that we can refer back to it later:

file1 = open('file1.pdf', 'rb') file2 = open('file2.pdf', 'rb')
Code language: JavaScript (javascript)

Now that we have our files open, we can create objects for each one using PyPDF2’s PdfFileReader class:

pdf1 = PyPDF2.PdfFileReader(file1) pdf2 = PyPDF2.PdfFileReader(file2)

We can also check how many pages are in each document using the getNumPages method:

print(pdf1.getNumPages()) # Output: 3 print(pdf2.getNumPages()) # Output: 2
Code language: PHP (php)

Great! Now that we have our file objects set up, let’s move on to creating our merged document.

To create our merged document, we’ll first need to instantiate a new PdfFileWriter object:

writer = PyPDF2.PdfFileWriter()

Next, we’ll usePyPDF’s addPage method to copy each page from our input documents into our output document one at a time:

for page in range(pdf1.getNumPages()): current_page = pdf1.getPage(page) writer.addPage(current_page) for page in range(pdf2getNumPages()): current_page = pdf2getPage(page) writer.addPage(current_page)

Finally, we’ll create our output document by creating a new file and passing it into PdfFileWriter’s write method:

output = open('output_file.pdf', 'wb') writer.write(output) output.close()
Code language: JavaScript (javascript)

And that’s it! We’ve now successfully merged two separate PDF files into one document using just a few lines of Python code.

This is another function to Merge Multiple PDFs into One

A full python script (source code).

import os from PyPDF2 import PdfFileReader, PdfFileWriter def merge_pdfs(): ''' Merge multiple PDF's into one combined PDF ''' input_paths = input(r"Enter comma separated list of paths to the PDFs ") paths = input_paths.split(',') pdf_file_writer = PdfFileWriter() # Pick each pdf one by one and combined to one single pdf for path in paths: pdf_file_reader = PdfFileReader(path) for page in range(pdf_file_reader.getNumPages()): pdf_file_writer.addPage(pdf_file_reader.getPage(page)) # Output the merged pdf with open('merged.pdf', 'wb') as out: pdf_file_writer.write(out)
Code language: Python (python)

Conclusion:

Whether you’re dealing with sensitive client information or simply don’t want to clutter up your hard drive with unnecessary duplicate files, merging multiple PDFs into one document is a great way to keep your workflow tidy and organized. Fortunately, Python makes it relatively simple to accomplish this task without any third-party software—so if you find yourself in need of combining some PDFs, give this method a try!

Andy Avery

I really enjoy helping people with their tech problems to make life easier, ​and that’s what I’ve been doing professionally for the past decade.

Recent Posts