Step up your coding game with AI-powered Code Explainer. Get insights like never before!
There are various tools to convert PDF files into images, such as pdftoppm in Linux. This tutorial aims to develop a lightweight command-line tool in Python to convert PDF files into images.
We'll be using PyMuPDF, a highly versatile, customizable PDF, XPS, and eBook interpreter solution that can be used across a wide range of applications such as a PDF renderer, viewer, or toolkit.
Download: Practical Python PDF Processing EBook.
First, let's install the required library:
$ pip install PyMuPDF==1.18.9Importing the libraries:
import fitz
from typing import Tuple
import osLet's define our main utility function:
def convert_pdf2img(input_file: str, pages: Tuple = None):
    """Converts pdf to image and generates a file by page"""
    # Open the document
    pdfIn = fitz.open(input_file)
    output_files = []
    # Iterate throughout the pages
    for pg in range(pdfIn.pageCount):
        if str(pages) != str(None):
            if str(pg) not in str(pages):
                continue
        # Select a page
        page = pdfIn[pg]
        rotate = int(0)
        # PDF Page is converted into a whole picture 1056*816 and then for each picture a screenshot is taken.
        # zoom = 1.33333333 -----> Image size = 1056*816
        # zoom = 2 ---> 2 * Default Resolution (text is clear, image text is hard to read)    = filesize small / Image size = 1584*1224
        # zoom = 4 ---> 4 * Default Resolution (text is clear, image text is barely readable) = filesize large
        # zoom = 8 ---> 8 * Default Resolution (text is clear, image text is readable) = filesize large
        zoom_x = 2
        zoom_y = 2
        # The zoom factor is equal to 2 in order to make text clear
        # Pre-rotate is to rotate if needed.
        mat = fitz.Matrix(zoom_x, zoom_y).preRotate(rotate)
        pix = page.getPixmap(matrix=mat, alpha=False)
        output_file = f"{os.path.splitext(os.path.basename(input_file))[0]}_page{pg+1}.png"
        pix.writePNG(output_file)
        output_files.append(output_file)
    pdfIn.close()
    summary = {
        "File": input_file, "Pages": str(pages), "Output File(s)": str(output_files)
    }
    # Printing Summary
    print("## Summary ########################################################")
    print("\n".join("{}:{}".format(i, j) for i, j in summary.items()))
    print("###################################################################")
    return output_filesThe above function converts a PDF file into a series of image files. It iterates through the selected pages (default is all of them), takes a screenshot of the current page, and generates an image file using the writePNG() method.
You can change the zoom_x and zoom_y to change the zoom factor, feel free to tweak these parameters and rotate variable to suit your needs.
Let's use this function now:
if __name__ == "__main__":
    import sys
    input_file = sys.argv[1]
    convert_pdf2img(input_file)Master PDF Manipulation with Python by building PDF tools from scratch. Get your copy now!
Download EBookLet's test the script out on a multiple-page PDF file (get it here):
$ python convert_pdf2image.py bert-paper.pdfThe output will be as the following:
## Summary ########################################################
File:bert-paper.pdf
Pages:None
Output File(s):['bert-paper_page1.png', 'bert-paper_page2.png', 'bert-paper_page3.png', 'bert-paper_page4.png', 'bert-paper_page5.png', 'bert-paper_page6.png', 'bert-paper_page7.png', 'bert-paper_page8.png', 'bert-paper_page9.png', 'bert-paper_page10.png', 'bert-paper_page11.png', 'bert-paper_page12.png', 'bert-paper_page13.png', 'bert-paper_page14.png', 'bert-paper_page15.png', 'bert-paper_page16.png']
###################################################################And indeed, the images were successfully generated:
 Conclusion
ConclusionWe hope that you find this tutorial helpful for your needs, here are some other PDF tutorials:
Finally, unlock the secrets of Python PDF manipulation! Our compelling Practical Python PDF Processing eBook offers exclusive, in-depth guidance you won't find anywhere else. If you're passionate about enriching your skill set and mastering the intricacies of PDF handling with Python, your journey begins with a single click right here. Let's explore together!
Check the full code here.
Happy coding ♥
Liked what you read? You'll love what you can learn from our AI-powered Code Explainer. Check it out!
View Full Code Explain The Code for Me
Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!