Python has many built-in modules and libraries that allow you to perform a wide range of tasks, including converting PDFs into audio. In this blog post, we’ll show you how to use the PyPDF2 library and gTTS (Google Text-to-Speech) library to convert a PDF into an audio file that you can play on your computer or mobile device. We’ll also provide a brief overview of the PDF format and explain why it’s often preferable to convert PDFs into audio files.
Why Convert PDFs Into Audio Files?
There are several reasons why you might want to convert a PDF into an audio file. For instance, if you’re studying for an exam and you need to review a lot of information contained in a PDF, it can be helpful to listen to the PDF as an audio file so that you can multi-task (e.g., listen while you’re doing the dishes or working out).
Similarly, if you’re commute is long and you prefer listening to books or other printed materials while you drive, converting a PDF into an audio file will enable you to do this.
Additionally, if you have a visual impairment, listening to a PDF as an audio file will make the material accessible to you.
How to Convert PDF into Audio Using Python
Now that we’ve discussed some of the reasons why you might want to convert a PDF into an audio file, let’s walk through how to do this using Python. We’ll be using the PyPDF2 library, which is freely available for download from PyPI (the Python Package Index).
To begin, open up a new Python file and import the following modules:
gTTS: a Python library and CLI tool to interface with Google Translate text-to-speech API.
PyPDF2: This module will be used for working with PDF files.
Python script to convert pdf to audio mp3
Code language: Python (python)
import PyPDF2 import os import easygui from gtts import gTTS import re file = easygui.fileopenbox(msg="Select your PDF", title="Select a PDF File",default="*.pdf",filetypes=["*.pdf",]) file_path = file if(os.path.exists(file_path)): pass else: print("File does not exists") exit() f = open(file_path, 'rb') pdffile = PyPDF2.PdfFileReader(f) no_of_pages = pdffile.getNumPages() # Using regex to filter only words and numbers string_words = '' for pageno in range(no_of_pages): pi = pdffile.getPage(pageno) page = pdffile.getPage(pageno) content = page.extractText() textonly = re.findall(r'[a-zA-Z0-9]+', content) for word in textonly: string_words = string_words + ' ' + word # Convert the string of words to mp3 file tts = gTTS(text=string_words, lang='en') tts.save("listen_pdf.mp3")
Another Python Script to convert a PDF document into an MP3 audio file using python
Code language: Python (python)
# -*- coding: utf-8 -*- """ Created on Sun Oct 11 19:50:06 2020 @author: quent """ import PyPDF2 import pyttsx3 from gtts import gTTS # pip install gTTS from tkinter import Tk from tkinter.filedialog import askopenfilename Tk().withdraw() # We could make our own GUI but let's use the default one FILE_PATH = askopenfilename() # open the dialog GUI with open(FILE_PATH, "rb") as f: # open the file in reading (rb) mode and call it f pdf = PyPDF2.PdfFileReader(f) txt_file = ' ' # str variable # parse every page for page in pdf.pages: text = page.extractText() txt_file += text # stores text into txt_file variable and convert it into str form as gtts library only saves text file into mp3 ## speaking part #### engine = pyttsx3.init() engine.say(text) engine.runAndWait() audio_file = gTTS(text=txt_file, lang='en') # stores into variable # saves into mp3 format with the same name of pdf in the same directory where pdf is audio_file.save(FILE_PATH.split('.') + ".mp3")
That’s all there is too it! In just a few lines of code, we were able to convert a PDF document into an MP3 audio file using python. You can now use this technique to convert any PDF document into audio files that can be played on any major music platform!