Want to code faster? Our Python Code Generator lets you create Python scripts with just a few clicks. Try it now!
The primary goal of merging PDF files is for proper file management, for archiving, bulk printing, or combining datasheets, e-books, and reports. You definitely need an efficient tool to merge small PDF files into a single PDF.
This tutorial is intended to show you how to merge a list of PDF files into a single PDF using the Python programming language. The combined PDF may include bookmarks to improve the navigation where every bookmark is linked to the content of one of the inputted PDF files.
We'll be using the PyPDF4 library for this purpose. PyPDF4 is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.
Download: Practical Python PDF Processing EBook.
Let's install it:
$ pip install PyPDF4==1.27.0Importing the libraries:
#Import Libraries
from PyPDF4 import PdfFileMerger
import os,argparseLet's define our core function:
def merge_pdfs(input_files: list, page_range: tuple, output_file: str, bookmark: bool = True):
    """
    Merge a list of PDF files and save the combined result into the `output_file`.
    `page_range` to select a range of pages (behaving like Python's range() function) from the input files
        e.g (0,2) -> First 2 pages 
        e.g (0,6,2) -> pages 1,3,5
    bookmark -> add bookmarks to the output file to navigate directly to the input file section within the output file.
    """
    # strict = False -> To ignore PdfReadError - Illegal Character error
    merger = PdfFileMerger(strict=False)
    for input_file in input_files:
        bookmark_name = os.path.splitext(os.path.basename(input_file))[0] if bookmark else None
        # pages To control which pages are appended from a particular file.
        merger.append(fileobj=open(input_file, 'rb'), pages=page_range, import_bookmarks=False, bookmark=bookmark_name)
    # Insert the pdf at specific page
    merger.write(fileobj=open(output_file, 'wb'))
    merger.close()So we first create a PDFFileMerger object and then iterates over input_files from the input. After that, for each input PDF file, we define a bookmark if required depending on the bookmark variable and add it to the merger object taking into account the page_range chosen.
Next, we use the append() method from the merger to add our PDF file.
Finally, we write the output PDF file and close the object.
Let's now add a function to parse command-line arguments:
def parse_args():
    """Get user command line parameters"""
    parser = argparse.ArgumentParser(description="Available Options")
    parser.add_argument('-i', '--input_files', dest='input_files', nargs='*',
                        type=str, required=True, help="Enter the path of the files to process")
    parser.add_argument('-p', '--page_range', dest='page_range', nargs='*',
                        help="Enter the pages to consider e.g.: (0,2) -> First 2 pages")
    parser.add_argument('-o', '--output_file', dest='output_file',
                        required=True, type=str, help="Enter a valid output file")
    parser.add_argument('-b', '--bookmark', dest='bookmark', default=True, type=lambda x: (
        str(x).lower() in ['true', '1', 'yes']), help="Bookmark resulting file")
    # To Porse The Command Line Arguments
    args = vars(parser.parse_args())
    # To Display The Command Line Arguments
    print("## Command Arguments #################################################")
    print("\n".join("{}:{}".format(i, j) for i, j in args.items()))
    print("######################################################################")
    return argsMaster PDF Manipulation with Python by building PDF tools from scratch. Get your copy now!
Download EBookNow let's use the previously defined functions in our main code:
if __name__ == "__main__":
    # Parsing command line arguments entered by user
    args = parse_args()
    page_range = None
    if args['page_range']:
        page_range = tuple(int(x) for x in args['page_range'][0].split(','))
    # call the main function
    merge_pdfs(
        input_files=args['input_files'], page_range=page_range, 
        output_file=args['output_file'], bookmark=args['bookmark']
    )Alright, we're done with coding, let's test it out:
$ python pdf_merger.py --helpOutput:
usage: pdf_merger.py [-h] -i [INPUT_FILES [INPUT_FILES ...]] [-p [PAGE_RANGE [PAGE_RANGE ...]]] -o OUTPUT_FILE [-b BOOKMARK]
Available Options
optional arguments:
  -h, --help            show this help message and exit
  -i [INPUT_FILES [INPUT_FILES ...]], --input_files [INPUT_FILES [INPUT_FILES ...]]
                        Enter the path of the files to process
  -p [PAGE_RANGE [PAGE_RANGE ...]], --page_range [PAGE_RANGE [PAGE_RANGE ...]]
                        Enter the pages to consider e.g.: (0,2) -> First 2 pages
  -o OUTPUT_FILE, --output_file OUTPUT_FILE
                        Enter a valid output file
  -b BOOKMARK, --bookmark BOOKMARK
                        Bookmark resulting fileHere is an example of merging two PDF files into one:
$ python pdf_merger.py -i bert-paper.pdf letter.pdf -o combined.pdfYou need to separate the input PDF files with a comma (,) in the -i argument, and you must not add any space.
A new combined.pdf appeared in the current directory that contains both of the input PDF files, the output is:
## Command Arguments #################################################
input_files:['bert-paper.pdf', 'letter.pdf']
page_range:None
output_file:combined.pdf
bookmark:True
######################################################################Make sure you use the right order of the input files when passing the -i argument.
I hope this code helped you out in merging PDF files easily and without 3rd party or online tools, as using Python to perform such tasks is more convenient.
If you want to split PDF documents instead, this tutorial will certainly help you.
Check the full code here.
Here are some related Python tutorials:
Finally, for more PDF handling guides on Python, you can check our Practical Python PDF Processing EBook, where we dive deeper into PDF document manipulation with Python, make sure to check it out here if you're interested!
Happy coding ♥
Liked what you read? You'll love what you can learn from our AI-powered Code Explainer. Check it out!
View Full Code Improve My Code
Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!