Zip and Unzip Files in Python: zipfile, shutil

Modified: | Tags: Python, File

In Python, the zipfile module allows you to zip and unzip files, i.e., compress files into a ZIP file and extract a ZIP file.

You can also easily zip a directory (folder) and unzip a ZIP file using the make_archive() and unpack_archive() functions from the shutil module.

See the following article on the built-in zip() function.

All sample code in this article assumes that the zipfile and shutil modules have been imported. They are included in the standard library, so no additional installation is required.

import zipfile
import shutil

Zip a directory (folder): shutil.make_archive()

You can zip a directory (folder) into a ZIP file using shutil.make_archive().

Here are the first three arguments for shutil.make_archive() in order:

  1. base_name: Path for the ZIP file to be created (without extension)
  2. format: Archive format. Options are 'zip', 'tar', 'gztar', 'bztar', and 'xztar'
  3. root_dir: Path of the directory to compress

For example, suppose there is a directory dir_zip with the following structure in the current directory.

dir_zip
├── dir_sub
│   └── file_sub.txt
└── file.txt

Compress this directory into a ZIP file archive_shutil.zip in the current directory.

import shutil

shutil.make_archive('archive_shutil', format='zip', root_dir='dir_zip')

In this case, the specified directory dir_zip itself is not included in archive_shutil.zip.

If you want to include the directory itself, specify the parent directory of the target for the third argument root_dir, and the relative path from root_dir to the target directory for the fourth argument base_dir.

shutil.make_archive('archive_shutil_base', format='zip',
                    root_dir='.', base_dir='dir_zip')

See the next section for the result of unzipping.

Unzip a ZIP file: shutil.unpack_archive()

You can unzip a ZIP file and extract all its contents using shutil.unpack_archive().

The first argument filename is the path of the ZIP file, and the second argument extract_dir is the path of the target directory where the archive is extracted.

If extract_dir is omitted, the archive is extracted to the current directory.

Here, extract the ZIP file compressed in the previous section.

shutil.unpack_archive('archive_shutil.zip', 'dir_out')

It is extracted as follows:

dir_out
├── dir_sub
│   └── file_sub.txt
└── file.txt

While the documentation doesn't explicitly mention it, it appears that a new directory is created even if extract_dir doesn't exist (as confirmed in Python 3.11.4).

The ZIP file created by shutil.make_archive() with base_dir is extracted as follows:

shutil.unpack_archive('archive_shutil_base.zip', 'dir_out_base')
dir_out_base
└── dir_zip
    ├── dir_sub
    │   └── file_sub.txt
    └── file.txt

Basics of the zipfile module: ZipFile objects

The zipfile module provides the ZipFile class, which allows you to create, read, and write ZIP files.

The constructor zipfile.ZipFile(file, mode, ...) is used to create ZipFile objects. Here, file represents the path of a ZIP file, and mode can be 'r' for read, 'w' for write, or 'a' for append.

The ZipFile object needs to be closed with the close() method, but if you use the with statement, it is closed automatically when the block is finished.

The usage is similar to reading and writing files with the built-in open() function; you can specify the mode and use the with statement.

Specific examples are described in the following sections.

Compress individual files into a ZIP file

To compress individual files into a ZIP file, create a new ZipFile object and add the files you want to compress with the write() method.

Create a new ZipFile object

With zipfile.ZipFile(), provide the path of the ZIP file you want to create as the first argument file and set the second argument mode to 'w' for writing.

In write mode, you can specify the compression method and level using the compression and compresslevel arguments. The available options for compression are as follows; BZIP2 and LZMA have a higher compression ratio, but it takes longer to compress.

  • zipfile.ZIP_STORED: No compression (default)
  • zipfile.ZIP_DEFLATED: Usual ZIP compression
  • zipfile.ZIP_BZIP2: BZIP2 compression
  • zipfile.ZIP_LZMA: LZMA compression

For ZIP_DEFLATED, compresslevel corresponds to the level of zlib.compressobj(). The default is -1 (Z_DEFAULT_COMPRESSION).

level is the compression level – an integer from 0 to 9 or -1. A value of 1 (Z_BEST_SPEED) is fastest and produces the least compression, while a value of 9 (Z_BEST_COMPRESSION) is slowest and produces the most. 0 (Z_NO_COMPRESSION) is no compression. The default value is -1 (Z_DEFAULT_COMPRESSION). Z_DEFAULT_COMPRESSION represents a default compromise between speed and compression (currently equivalent to level 6). zlib.compressobj() — Compression compatible with gzip — Python 3.11.4 documentation

Add files with the write() method

To add a file to the ZipFile object, use the write() method.

The first argument filename is the path to the file to be added. The second argument arcname specifies the name of the file in the ZIP archive; if arcname is omitted, the name of filename is used. In addition, arcname can be used to define a directory structure.

import zipfile

with zipfile.ZipFile('archive_zipfile.zip', 'w',
                     compression=zipfile.ZIP_DEFLATED,
                     compresslevel=9) as zf:
    zf.write('dir_zip/file.txt', arcname='file.txt')
    zf.write('dir_zip/dir_sub/file_sub.txt', arcname='dir_sub/file_sub.txt')

You can also select a compression method and level for each file by specifying compress_type and compresslevel in the write() method.

Add other files to an existing ZIP file

To append additional files to an existing ZIP file, create a ZipFile object in append mode. Specify the path of the existing ZIP file for the first argument file and 'a' (append) for the second argument mode of zipfile.ZipFile().

Add existing files with the write() method

You can add existing files with the write() method of the ZipFile object.

The following is an example of adding another_file.txt in the current directory. The argument arcname is omitted.

with zipfile.ZipFile('archive_zipfile.zip', 'a') as zf:
    zf.write('another_file.txt')

Create and add a new file with the open() method

You can also create a new file within the ZIP and add content to it. Create the ZipFile object in append mode ('a') and use its open() method.

Specify the path of the file you want to create within the ZIP as the first argument, and set the second argument mode to 'w' of the open() method.

You can write the contents with the write() method of the opened file object.

with zipfile.ZipFile('archive_zipfile.zip', 'a') as zf:
    with zf.open('dir_sub/new_file.txt', 'w') as f:
        f.write(b'text in new file')

The argument for the write() method should be specified in bytes, not str. To write text, either use the b'...' notation or convert the string using its encode() method.

print(type(b'text'))
# <class 'bytes'>

print(type('text'.encode('utf-8')))
# <class 'bytes'>

An example of reading a file from a ZIP using the open() method of the ZipFile object will be described later.

Check the list of files in a ZIP file

To check the contents of a ZIP file, create a ZipFile object in read mode ('r', default).

You can get a list of archived items with the namelist() method of the ZipFile object.

with zipfile.ZipFile('archive_zipfile.zip') as zf:
    print(zf.namelist())
# ['file.txt', 'dir_sub/file_sub.txt', 'another_file.txt', 'dir_sub/new_file.txt']

with zipfile.ZipFile('archive_shutil.zip') as zf:
    print(zf.namelist())
# ['dir_sub/', 'file.txt', 'dir_sub/file_sub.txt']

As seen from the above results, ZIP files created with shutil.make_archive() list directories individually. The same behavior is observed with ZIP files compressed using the standard function of the Finder on a Mac.

You can exclude directories with list comprehensions.

with zipfile.ZipFile('archive_shutil.zip') as zf:
    print([x for x in zf.namelist() if not x.endswith('/')])
# ['file.txt', 'dir_sub/file_sub.txt']

Extract individual files from a ZIP file

To unzip a ZIP file, create a ZipFile object in read mode ('r', default).

If you want to extract only specific files, use the extract() method.

The first argument member specifies the name of the file to be extracted (including its directory structure within the ZIP file), and the second argument path specifies the directory to extract the file.

with zipfile.ZipFile('archive_zipfile.zip') as zf:
    zf.extract('file.txt', 'dir_out_extract')
    zf.extract('dir_sub/file_sub.txt', 'dir_out_extract')

If you want to extract all files, use the extractall() method. Specify the path of the directory to extract to as the first argument path.

with zipfile.ZipFile('archive_zipfile.zip') as zf:
    zf.extractall('dir_out_extractall')

In both cases, if path is omitted, files are extracted to the current directory. Although the documentation doesn't specify it, it seems to create a new directory even if path is non-existent (confirmed in Python 3.11.4).

Read files in a ZIP file

You can directly read files in a ZIP file.

Create a ZipFile object in read mode (default) and open the file inside with the open() method.

The first argument of open() is the name of a file in the ZIP (it may include the directory). The second argument mode can be omitted since the default value is 'r' (read).

You can read the contents using the read() method of the opened file object. The method returns a byte string (bytes), which can be converted to a string (str) using the decode() method.

with zipfile.ZipFile('archive_zipfile.zip') as zf:
    with zf.open('dir_sub/new_file.txt') as f:
        b = f.read()

print(b)
# b'text in new file'

print(type(b))
# <class 'bytes'>

s = b.decode('utf-8')
print(s)
# text in new file

print(type(s))
# <class 'str'>

Besides read(), you can also use readline() and readlines() with the file object, just like when using the built-in open() function.

ZIP with passwords (encryption and decryption)

The zipfile module supports decryption of password-protected (encrypted) ZIP files, but it does not support encryption.

It supports decryption of encrypted files in ZIP archives, but it currently cannot create an encrypted file. Decryption is extremely slow as it is implemented in native Python rather than C. zipfile — Work with ZIP archives — Python 3.11.4 documentation

Also, AES is not supported.

The zipfile module from the Python standard library supports only CRC32 encrypted zip files (see here: http://hg.python.org/cpython/file/71adf21421d9/Lib/zipfile.py#l420 ). zip - Python unzip AES-128 encrypted file - Stack Overflow

Both shutil.make_archive() and shutil.unpack_archive() do not support encryption or decryption.

pyzipper

The pyzipper library, as introduced on Stack Overflow above, supports AES encryption and decryption and can be used similarly to the zipfile module.

To create a ZIP file with a password, specify encryption=pyzipper.WZ_AES with pyzipper.AESZipFile() and set the password with the setpassword() method. Note that you need to specify the password as a byte string (bytes).

import pyzipper

with pyzipper.AESZipFile('archive_with_pass.zip', 'w',
                         encryption=pyzipper.WZ_AES) as zf:
    zf.setpassword(b'password')
    zf.write('dir_zip/file.txt', arcname='file.txt')
    zf.write('dir_zip/dir_sub/file_sub.txt', arcname='dir_sub/file_sub.txt')

The following is an example of unzipping a ZIP file with a password.

with pyzipper.AESZipFile('archive_with_pass.zip') as zf:
    zf.setpassword(b'password')
    zf.extractall('dir_out_pyzipper')

Of course, if the password is wrong, it cannot be decrypted.

# with pyzipper.AESZipFile('archive_with_pass.zip') as zf:
#     zf.setpassword(b'wrong_password')
#     zf.extractall('dir_out_pass')
# RuntimeError: Bad password for file 'file.txt'

The zipfile module also allows you to specify a password, but as mentioned above, it does not support AES.

# with zipfile.ZipFile('archive_with_pass.zip') as zf:
#     zf.setpassword(b'password')
#     zf.extractall('dir_out_pass')
# NotImplementedError: That compression method is not supported

Execute command with subprocess.run()

If zipfile or pyzipper don't work for your needs, you can use subprocess.run() to handle the task using command-line tools.

For example, use the 7z command of 7-zip (installation required).

import subprocess

subprocess.run(['7z', 'x', 'archive_with_pass.zip', '-ppassword', '-odir_out_7z'])

Equivalent to the following commands. -x is expansion. Note that -p<password> and -o<directory> do not require spaces.

$ 7z x archive_with_pass.zip -ppassword -odir_out_pass_7z'

Related Categories

Related Articles