Merge and Split PDFs with pypdf in Python

Posted: 2025-05-19 | Tags: Python, PDF

The Python library pypdf (formerly PyPDF2) allows you to merge multiple PDF files, extract and combine specific pages, or split a PDF into separate pages.

py-pdf/pypdf: A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

Contents

Install pypdf
Merge Multiple PDF Files
- Simple Concatenation
- Insert in the Middle
Extract and Merge Specific Pages
Split PDFs at Specific Pages
Metadata
Password-Protected PDFs
Example: Merge All PDFs in a Directory
Example: Split into Single-Page Files

The sample PDFs used in this article are available at the following link. All password-protected files use password as their password:

python-snippets/notebook/data/src/pdf

Install pypdf

pypdf has no external dependencies and can be installed via pip (or pip3). If you need support for AES encryption and decryption, install it with the [crypto] extra.

$ pip install pypdf
$ pip install pypdf[crypto]

The examples in this article use pypdf version 5.5.0.

The library was previously known as PyPDF2 until it was renamed to pypdf in 2023.

History of pypdf — pypdf 5.5.0 documentation

Merge Multiple PDF Files

Simple Concatenation

To merge entire PDF files (all pages) in order:

Create an instance of PdfWriter.
Add each file using append().
Save the result using write().

For details on PdfWriter, see the official documentation:

The PdfWriter Class — pypdf 5.5.0 documentation

Specify the input file path in append(), and the output path in write():

import pypdf

print(pypdf.__version__)
# 5.5.0

writer = pypdf.PdfWriter()

writer.append('data/src/pdf/sample1.pdf')
writer.append('data/src/pdf/sample2.pdf')
writer.append('data/src/pdf/sample3.pdf')

writer.write('data/temp/sample_merge.pdf')

source: pypdf_merge_full.py

Insert in the Middle

To insert a PDF into a specific position, use the merge() method.

The position argument (0-based) specifies the insertion point. When inserting multiple files, keep in mind that each insertion affects subsequent positions.

writer = pypdf.PdfWriter()

writer.append('data/src/pdf/sample1.pdf')
writer.merge(2, 'data/src/pdf/sample2.pdf')
writer.merge(4, 'data/src/pdf/sample3.pdf')

writer.write('data/temp/sample_insert.pdf')

source: pypdf_merge_full.py

Extract and Merge Specific Pages

Specific pages can be selected using the pages argument in append() and merge(), which accepts either a tuple or a PageRange object.

Another option is to manually add pages using PdfReader.

Use a Tuple

A tuple takes the form (start, stop[, step]), similar to the range() function where stop is exclusive.

How to Use range() in Python

writer = pypdf.PdfWriter()

writer.append('data/src/pdf/sample1.pdf', pages=(0, 1))
writer.append('data/src/pdf/sample2.pdf', pages=(2, 4))
writer.merge(2, 'data/src/pdf/sample3.pdf', pages=(0, 3, 2))

writer.write('data/temp/sample_merge_page.pdf')

source: pypdf_merge_page.py

Use a `PageRange` Object

PageRange allows you to specify pages using a string, either as a Python-style slice (e.g., '2:5', '::-1') or as a single page index (e.g., '3', '-1').

How to Slice a List, String, and Tuple in Python

Create a PageRange by passing such a string to pypdf.PageRange().

The PageRange Class — pypdf 5.5.0 documentation

writer = pypdf.PdfWriter()

writer.append('data/src/pdf/sample1.pdf', pages=pypdf.PageRange('-1'))
writer.append('data/src/pdf/sample2.pdf', pages=pypdf.PageRange('2:'))
writer.merge(2, 'data/src/pdf/sample3.pdf', pages=pypdf.PageRange('::-1'))

writer.write('data/temp/sample_merge_pagerange.pdf')

source: pypdf_merge_page.py

Use `PdfReader` and `add_page()`

You can use PdfReader to select pages and PdfWriter to add them individually.

reader1 = pypdf.PdfReader('data/src/pdf/sample1.pdf')
reader2 = pypdf.PdfReader('data/src/pdf/sample2.pdf')

writer = pypdf.PdfWriter()

writer.add_page(reader1.pages[0])
writer.add_page(reader2.pages[2])

writer.write('data/temp/sample_merge_wr.pdf')

source: pypdf_merge_page.py

Note: pages[] supports indexing but not slicing. Use tuples or PageRange for multi-page selections. However, for single pages, this method is simple and efficient.

Split PDFs at Specific Pages

While there's no direct method to split a PDF, you can create new files using selected pages, effectively splitting the original.

writer = pypdf.PdfWriter()
writer.append('data/src/pdf/sample1.pdf', pages=pypdf.PageRange(':2'))
writer.write('data/temp/sample_split1.pdf')

writer = pypdf.PdfWriter()
writer.append('data/src/pdf/sample1.pdf', pages=pypdf.PageRange('2:'))
writer.write('data/temp/sample_split2.pdf')

source: pypdf_split.py

See the last section for an example of splitting a file into individual pages.

Metadata

The previous examples do not handle metadata such as the author or title. By default, the generated PDF will contain no metadata.

Use the add_metadata() method of PdfWriter to add metadata. You can copy it from another file via PdfReader.metadata.

writer = pypdf.PdfWriter()

writer.append('data/src/pdf/sample1.pdf')
writer.append('data/src/pdf/sample2.pdf')

writer.add_metadata(pypdf.PdfReader('data/src/pdf/sample1.pdf').metadata)
writer.add_metadata({'/Title': 'merged file'})

writer.write('data/temp/sample_merge_meta.pdf')

source: pypdf_merge_full.py

You can add or modify specific metadata fields as needed. The example above sets the /Title. For more on metadata handling, see:

Manage PDF Metadata with pypdf in Python

Password-Protected PDFs

You can't use append() or merge() directly on encrypted PDFs. You'll first need to decrypt and save them as new files.

For more details on working with password-protected PDFs using pypdf, see:

Encrypt and Decrypt PDFs with pypdf in Python

Example: Merge All PDFs in a Directory

Use the glob module from the standard library to list files in a directory.

How to Use glob() in Python

You can then merge all PDFs in that folder as shown below:

import glob
import os

def merge_pdf_in_dir(dir_path, dst_path):
    l = glob.glob(os.path.join(dir_path, '*.pdf'))
    l.sort()

    writer = pypdf.PdfWriter()
    for p in l:
        if not pypdf.PdfReader(p).is_encrypted:
            writer.append(p)

    writer.write(dst_path)

merge_pdf_in_dir('data/src/pdf', 'data/temp/sample_dir.pdf')

source: pypdf_merge_dir.py

In this example, files are sorted alphabetically before merging.

Since some files may be encrypted, the is_encrypted attribute is used to skip them.

Encrypt and Decrypt PDFs with pypdf in Python

Example: Split into Single-Page Files

To split a PDF into individual one-page files, define a function like the following:

def split_pdf_pages(src_path, dst_basepath):
    src_pdf = pypdf.PdfReader(src_path)
    for i, page in enumerate(src_pdf.pages):
        dst_pdf = pypdf.PdfWriter()
        dst_pdf.add_page(page)
        dst_pdf.write(f'{dst_basepath}_{i}.pdf')

split_pdf_pages('data/src/pdf/sample1.pdf', 'data/temp/sample1')

source: pypdf_split_pages.py

Here, enumerate() is used to get the page index and generate sequential filenames using f-strings.

Merge and Split PDFs with pypdf in Python

Install pypdf

Merge Multiple PDF Files

Simple Concatenation

Insert in the Middle

Extract and Merge Specific Pages

Use a Tuple

Use a `PageRange` Object

Use `PdfReader` and `add_page()`

Split PDFs at Specific Pages

Metadata

Password-Protected PDFs

Example: Merge All PDFs in a Directory

Example: Split into Single-Page Files

Related Categories

Related Articles

Merge and Split PDFs with pypdf in Python

Install pypdf

Merge Multiple PDF Files

Simple Concatenation

Insert in the Middle

Extract and Merge Specific Pages

Use a Tuple

Use a PageRange Object

Use PdfReader and add_page()

Split PDFs at Specific Pages

Metadata

Password-Protected PDFs

Example: Merge All PDFs in a Directory

Example: Split into Single-Page Files

Related Categories

Related Articles

Use a `PageRange` Object

Use `PdfReader` and `add_page()`