Merge and Split PDFs with pypdf in Python
The Python library pypdf (formerly PyPDF2) allows you to merge multiple PDF files, extract and combine specific pages, or split a PDF into separate pages.
The sample PDFs used in this article are available at the following link. All password-protected files use password
as their password:
Install pypdf
pypdf
has no external dependencies and can be installed via pip
(or pip3
). If you need support for AES encryption and decryption, install it with the [crypto]
extra.
$ pip install pypdf
$ pip install pypdf[crypto]
The examples in this article use pypdf version 5.5.0
.
The library was previously known as PyPDF2 until it was renamed to pypdf in 2023.
Merge Multiple PDF Files
Simple Concatenation
To merge entire PDF files (all pages) in order:
- Create an instance of
PdfWriter
. - Add each file using
append()
. - Save the result using
write()
.
For details on PdfWriter
, see the official documentation:
Specify the input file path in append()
, and the output path in write()
:
import pypdf
print(pypdf.__version__)
# 5.5.0
writer = pypdf.PdfWriter()
writer.append('data/src/pdf/sample1.pdf')
writer.append('data/src/pdf/sample2.pdf')
writer.append('data/src/pdf/sample3.pdf')
writer.write('data/temp/sample_merge.pdf')
Insert in the Middle
To insert a PDF into a specific position, use the merge()
method.
The position
argument (0
-based) specifies the insertion point. When inserting multiple files, keep in mind that each insertion affects subsequent positions.
writer = pypdf.PdfWriter()
writer.append('data/src/pdf/sample1.pdf')
writer.merge(2, 'data/src/pdf/sample2.pdf')
writer.merge(4, 'data/src/pdf/sample3.pdf')
writer.write('data/temp/sample_insert.pdf')
Extract and Merge Specific Pages
Specific pages can be selected using the pages
argument in append()
and merge()
, which accepts either a tuple or a PageRange
object.
Another option is to manually add pages using PdfReader
.
Use a Tuple
A tuple takes the form (start, stop[, step])
, similar to the range()
function where stop
is exclusive.
writer = pypdf.PdfWriter()
writer.append('data/src/pdf/sample1.pdf', pages=(0, 1))
writer.append('data/src/pdf/sample2.pdf', pages=(2, 4))
writer.merge(2, 'data/src/pdf/sample3.pdf', pages=(0, 3, 2))
writer.write('data/temp/sample_merge_page.pdf')
Use a PageRange
Object
PageRange
allows you to specify pages using a string, either as a Python-style slice (e.g., '2:5'
, '::-1'
) or as a single page index (e.g., '3'
, '-1'
).
Create a PageRange
by passing such a string to pypdf.PageRange()
.
writer = pypdf.PdfWriter()
writer.append('data/src/pdf/sample1.pdf', pages=pypdf.PageRange('-1'))
writer.append('data/src/pdf/sample2.pdf', pages=pypdf.PageRange('2:'))
writer.merge(2, 'data/src/pdf/sample3.pdf', pages=pypdf.PageRange('::-1'))
writer.write('data/temp/sample_merge_pagerange.pdf')
Use PdfReader
and add_page()
You can use PdfReader
to select pages and PdfWriter
to add them individually.
reader1 = pypdf.PdfReader('data/src/pdf/sample1.pdf')
reader2 = pypdf.PdfReader('data/src/pdf/sample2.pdf')
writer = pypdf.PdfWriter()
writer.add_page(reader1.pages[0])
writer.add_page(reader2.pages[2])
writer.write('data/temp/sample_merge_wr.pdf')
Note: pages[]
supports indexing but not slicing. Use tuples or PageRange
for multi-page selections. However, for single pages, this method is simple and efficient.
Split PDFs at Specific Pages
While there's no direct method to split a PDF, you can create new files using selected pages, effectively splitting the original.
writer = pypdf.PdfWriter()
writer.append('data/src/pdf/sample1.pdf', pages=pypdf.PageRange(':2'))
writer.write('data/temp/sample_split1.pdf')
writer = pypdf.PdfWriter()
writer.append('data/src/pdf/sample1.pdf', pages=pypdf.PageRange('2:'))
writer.write('data/temp/sample_split2.pdf')
See the last section for an example of splitting a file into individual pages.
Metadata
The previous examples do not handle metadata such as the author or title. By default, the generated PDF will contain no metadata.
Use the add_metadata()
method of PdfWriter
to add metadata. You can copy it from another file via PdfReader.metadata
.
writer = pypdf.PdfWriter()
writer.append('data/src/pdf/sample1.pdf')
writer.append('data/src/pdf/sample2.pdf')
writer.add_metadata(pypdf.PdfReader('data/src/pdf/sample1.pdf').metadata)
writer.add_metadata({'/Title': 'merged file'})
writer.write('data/temp/sample_merge_meta.pdf')
You can add or modify specific metadata fields as needed. The example above sets the /Title
. For more on metadata handling, see:
Password-Protected PDFs
You can't use append()
or merge()
directly on encrypted PDFs. You'll first need to decrypt and save them as new files.
For more details on working with password-protected PDFs using pypdf, see:
Example: Merge All PDFs in a Directory
Use the glob
module from the standard library to list files in a directory.
You can then merge all PDFs in that folder as shown below:
import glob
import os
def merge_pdf_in_dir(dir_path, dst_path):
l = glob.glob(os.path.join(dir_path, '*.pdf'))
l.sort()
writer = pypdf.PdfWriter()
for p in l:
if not pypdf.PdfReader(p).is_encrypted:
writer.append(p)
writer.write(dst_path)
merge_pdf_in_dir('data/src/pdf', 'data/temp/sample_dir.pdf')
In this example, files are sorted alphabetically before merging.
Since some files may be encrypted, the is_encrypted
attribute is used to skip them.
Example: Split into Single-Page Files
To split a PDF into individual one-page files, define a function like the following:
def split_pdf_pages(src_path, dst_basepath):
src_pdf = pypdf.PdfReader(src_path)
for i, page in enumerate(src_pdf.pages):
dst_pdf = pypdf.PdfWriter()
dst_pdf.add_page(page)
dst_pdf.write(f'{dst_basepath}_{i}.pdf')
split_pdf_pages('data/src/pdf/sample1.pdf', 'data/temp/sample1')
Here, enumerate()
is used to get the page index and generate sequential filenames using f-strings.