Merge and Split PDFs with pypdf in Python

Posted: | Tags: Python, PDF

The Python library pypdf (formerly PyPDF2) allows you to merge multiple PDF files, extract and combine specific pages, or split a PDF into separate pages.

The sample PDFs used in this article are available at the following link. All password-protected files use password as their password:

Install pypdf

pypdf has no external dependencies and can be installed via pip (or pip3). If you need support for AES encryption and decryption, install it with the [crypto] extra.

$ pip install pypdf
$ pip install pypdf[crypto]

The examples in this article use pypdf version 5.5.0.

The library was previously known as PyPDF2 until it was renamed to pypdf in 2023.

Merge Multiple PDF Files

Simple Concatenation

To merge entire PDF files (all pages) in order:

  1. Create an instance of PdfWriter.
  2. Add each file using append().
  3. Save the result using write().

For details on PdfWriter, see the official documentation:

Specify the input file path in append(), and the output path in write():

import pypdf

print(pypdf.__version__)
# 5.5.0

writer = pypdf.PdfWriter()

writer.append('data/src/pdf/sample1.pdf')
writer.append('data/src/pdf/sample2.pdf')
writer.append('data/src/pdf/sample3.pdf')

writer.write('data/temp/sample_merge.pdf')

Insert in the Middle

To insert a PDF into a specific position, use the merge() method.

The position argument (0-based) specifies the insertion point. When inserting multiple files, keep in mind that each insertion affects subsequent positions.

writer = pypdf.PdfWriter()

writer.append('data/src/pdf/sample1.pdf')
writer.merge(2, 'data/src/pdf/sample2.pdf')
writer.merge(4, 'data/src/pdf/sample3.pdf')

writer.write('data/temp/sample_insert.pdf')

Extract and Merge Specific Pages

Specific pages can be selected using the pages argument in append() and merge(), which accepts either a tuple or a PageRange object.

Another option is to manually add pages using PdfReader.

Use a Tuple

A tuple takes the form (start, stop[, step]), similar to the range() function where stop is exclusive.

writer = pypdf.PdfWriter()

writer.append('data/src/pdf/sample1.pdf', pages=(0, 1))
writer.append('data/src/pdf/sample2.pdf', pages=(2, 4))
writer.merge(2, 'data/src/pdf/sample3.pdf', pages=(0, 3, 2))

writer.write('data/temp/sample_merge_page.pdf')

Use a PageRange Object

PageRange allows you to specify pages using a string, either as a Python-style slice (e.g., '2:5', '::-1') or as a single page index (e.g., '3', '-1').

Create a PageRange by passing such a string to pypdf.PageRange().

writer = pypdf.PdfWriter()

writer.append('data/src/pdf/sample1.pdf', pages=pypdf.PageRange('-1'))
writer.append('data/src/pdf/sample2.pdf', pages=pypdf.PageRange('2:'))
writer.merge(2, 'data/src/pdf/sample3.pdf', pages=pypdf.PageRange('::-1'))

writer.write('data/temp/sample_merge_pagerange.pdf')

Use PdfReader and add_page()

You can use PdfReader to select pages and PdfWriter to add them individually.

reader1 = pypdf.PdfReader('data/src/pdf/sample1.pdf')
reader2 = pypdf.PdfReader('data/src/pdf/sample2.pdf')

writer = pypdf.PdfWriter()

writer.add_page(reader1.pages[0])
writer.add_page(reader2.pages[2])

writer.write('data/temp/sample_merge_wr.pdf')

Note: pages[] supports indexing but not slicing. Use tuples or PageRange for multi-page selections. However, for single pages, this method is simple and efficient.

Split PDFs at Specific Pages

While there's no direct method to split a PDF, you can create new files using selected pages, effectively splitting the original.

writer = pypdf.PdfWriter()
writer.append('data/src/pdf/sample1.pdf', pages=pypdf.PageRange(':2'))
writer.write('data/temp/sample_split1.pdf')

writer = pypdf.PdfWriter()
writer.append('data/src/pdf/sample1.pdf', pages=pypdf.PageRange('2:'))
writer.write('data/temp/sample_split2.pdf')

See the last section for an example of splitting a file into individual pages.

Metadata

The previous examples do not handle metadata such as the author or title. By default, the generated PDF will contain no metadata.

Use the add_metadata() method of PdfWriter to add metadata. You can copy it from another file via PdfReader.metadata.

writer = pypdf.PdfWriter()

writer.append('data/src/pdf/sample1.pdf')
writer.append('data/src/pdf/sample2.pdf')

writer.add_metadata(pypdf.PdfReader('data/src/pdf/sample1.pdf').metadata)
writer.add_metadata({'/Title': 'merged file'})

writer.write('data/temp/sample_merge_meta.pdf')

You can add or modify specific metadata fields as needed. The example above sets the /Title. For more on metadata handling, see:

Password-Protected PDFs

You can't use append() or merge() directly on encrypted PDFs. You'll first need to decrypt and save them as new files.

For more details on working with password-protected PDFs using pypdf, see:

Example: Merge All PDFs in a Directory

Use the glob module from the standard library to list files in a directory.

You can then merge all PDFs in that folder as shown below:

import glob
import os

def merge_pdf_in_dir(dir_path, dst_path):
    l = glob.glob(os.path.join(dir_path, '*.pdf'))
    l.sort()

    writer = pypdf.PdfWriter()
    for p in l:
        if not pypdf.PdfReader(p).is_encrypted:
            writer.append(p)

    writer.write(dst_path)

merge_pdf_in_dir('data/src/pdf', 'data/temp/sample_dir.pdf')

In this example, files are sorted alphabetically before merging.

Since some files may be encrypted, the is_encrypted attribute is used to skip them.

Example: Split into Single-Page Files

To split a PDF into individual one-page files, define a function like the following:

def split_pdf_pages(src_path, dst_basepath):
    src_pdf = pypdf.PdfReader(src_path)
    for i, page in enumerate(src_pdf.pages):
        dst_pdf = pypdf.PdfWriter()
        dst_pdf.add_page(page)
        dst_pdf.write(f'{dst_basepath}_{i}.pdf')

split_pdf_pages('data/src/pdf/sample1.pdf', 'data/temp/sample1')

Here, enumerate() is used to get the page index and generate sequential filenames using f-strings.

Related Categories

Related Articles