How to Merge PDFs and Split PDFs in Python? (Script, Source code)


PDFs are an incredibly useful file format that can be used for a myriad of purposes. But sometimes, you might find yourself needing to merge together multiple PDFs, or split a single PDF into multiple parts. Luckily, there’s a great open-source tool that can help you do just that: Python. In this blog post, we’ll show you how to use the Python programming language to merge and split PDFs.

Merging PDFs with Python

Let’s say you have three PDFs that you want to merge together into a single document. The first thing you’ll need to do is download the pyPdf library from http://pypdf2.readthedocs.io/en/latest/index.html#installation. Once you have the pyPdf library installed, you can start writing some code.

import os from PyPDF2 import PdfFileReader, PdfFileWriter def merge_pdfs(): ''' Merge multiple PDF's into one combined PDF ''' input_paths = input(r"Enter comma separated list of paths to the PDFs ") paths = input_paths.split(',') pdf_file_writer = PdfFileWriter() # Pick each pdf one by one and combined to one single pdf for path in paths: pdf_file_reader = PdfFileReader(path) for page in range(pdf_file_reader.getNumPages()): pdf_file_writer.addPage(pdf_file_reader.getPage(page)) # Output the merged pdf with open('merged.pdf', 'wb') as out: pdf_file_writer.write(out)
Code language: Python (python)

How to Split a PDF file in Python?

  • The first step is to install PyPDF2. This can be done using pip.
  • Next, we need to open the PDF file that we want to split. We do this using the open() function.
  • After that, we need to create an instance of the PdfFileReader class. This will allow us to read the PDF file.
  • Once we have done that, we need to get the number of pages in the PDF file using the getNumPages() function.
  • Now, we need to create a for loop that iterates through each page of the PDF file.
    Inside the for loop, we need to create an instance of the PdfFileWriter class. This will allow us to write to the output PDF file.
  • Then, we need to add each page of the input PDF file to the output PDF file using the addPage() function.
  • Finally, we need to write the contents of the output PDF file to a physical file using the write() function.
    And that’s it! We have now successfully split our PDF file into multiple smaller files.

A Full Python script to split a PDF

def split_pdfs(): '''Split PDF to multiple PDF's of 1 Page each''' input_pdf = input(r"Enter I/P PDF path ") pdf = PdfFileReader(input_pdf) for page in range(pdf.getNumPages()): pdf_file_writer = PdfFileWriter() pdf_file_writer.addPage(pdf.getPage(page)) # Append page num to each new pdf output = 'split{page}.pdf'.format(page=page) with open(output, 'wb') as output_pdf: pdf_file_writer.write(output_pdf)
Code language: Python (python)

Whether you are looking to split a PDF or merge multiple PDFs into one document, Python has you covered! By making use of the “PyPDF2” library, you can easily manipulate your PDFs right from within your Python console. Give it a try today!

Andy Avery

I really enjoy helping people with their tech problems to make life easier, ​and that’s what I’ve been doing professionally for the past decade.

Recent Posts