Copy a File/Directory in Python: shutil.copy, shutil.copytree

Posted: | Tags: Python, File

In Python, you can copy a file with shutil.copy() or shutil.copy2(), and a directory (folder) with shutil.copytree().

If you want to move or delete files and directories, refer to the following articles.

This article does not cover detailed specifications, like handling symbolic links. For comprehensive information, please check the official Python documentation.

The sample code in this article imports the shutil module as shown below. It is part of the standard library, so no additional installation is necessary.

import shutil

Copy a file with shutil.copy() and shutil.copy2()

To copy a file, use shutil.copy() or shutil.copy2().

Although both functions have identical usage, shutil.copy2() attempts to copy metadata such as creation and modification times, which shutil.copy() does not.

Basic usage

The first argument should be the path of the source file, and the second should be the path of the destination directory or file. The function returns the path of the newly copied file.

Paths can be represented as either strings or path-like objects like pathlib.Path. When specifying a directory as a string, the trailing delimiter (/) is optional.

Consider the following files and directories:

temp/
├── dir1/
├── dir2/
│   ├── file.txt
│   └── file2.txt
└── file.txt

If the second argument is an existing directory, the source file will be copied into it. If a file with the same name already exists in this directory, the source file will overwrite it.

new_path = shutil.copy('temp/file.txt', 'temp/dir1')
print(new_path)
# temp/dir1/file.txt

new_path = shutil.copy('temp/file.txt', 'temp/dir2')
print(new_path)
# temp/dir2/file.txt

If you provide a new path as the second argument, the file will be copied with that filename. Ensure all directories in the path exist before copying to avoid errors from non-existent directories.

new_path = shutil.copy('temp/file.txt', 'temp/dir1/file2.txt')
print(new_path)
# temp/dir1/file2.txt

# new_path = shutil.copy('temp/file.txt', 'temp/dir2/new_dir/file2.txt')
# FileNotFoundError: [Errno 2] No such file or directory: 'temp/dir2/new_dir/file2.txt'

Be aware that if you attempt to specify a new directory as the second argument, for example, temp/dir3, it will be interpreted as a filename and copied as a file named dir3. Specifying temp/dir3/ will result in an error. You must first create a directory.

If the second argument is an existing file, it will be overwritten by the source file.

new_path = shutil.copy('temp/file.txt', 'temp/dir2/file2.txt')
print(new_path)
# temp/dir2/file2.txt

The results are as follows. The content of temp/file.txt, which was provided as the source in the first argument, is copied and replaces the content of the destination files.

temp/
├── dir1/
│   ├── file.txt
│   └── file2.txt
├── dir2/
│   ├── file.txt
│   └── file2.txt
└── file.txt

The difference between shutil.copy() and shutil.copy2()

The primary difference between shutil.copy() and shutil.copy2() is their handling of metadata (creation time, modification time, etc.). While shutil.copy() does not copy metadata, shutil.copy2() attempts to do so.

In the case of shutil.copy(), the copied file's modification time is the time when the copy was made. You can verify this by comparing the modification times of the source and destination files using os.path.getmtime().

import os

new_path = shutil.copy('temp/file.txt', 'temp/file_copy.txt')
print(new_path)
# temp/file_copy.txt

print(os.path.getmtime('temp/file.txt') == os.path.getmtime('temp/file_copy.txt'))
# False

In contrast, shutil.copy2() attempts to copy as much metadata as possible. For example, the modification times of the source and destination files are the same.

new_path = shutil.copy2('temp/file.txt', 'temp/file_copy2.txt')
print(new_path)
# temp/file_copy2.txt

print(os.path.getmtime('temp/file.txt') == os.path.getmtime('temp/file_copy2.txt'))
# True

It is important to remember that shutil.copy2() cannot copy all metadata.

Warning: Even the higher-level file copying functions (shutil.copy(), shutil.copy2()) cannot copy all file metadata.

On POSIX platforms, this means that file owner and group are lost as well as ACLs. On Mac OS, the resource fork and other metadata are not used. This means that resources will be lost and file type and creator codes will not be correct. On Windows, file owners, ACLs and alternate data streams are not copied. shutil — High-level file operations — Python 3.11.4 documentation

Copy a directory (folder) with shutil.copytree()

Basic usage

Use shutil.copytree() to recursively copy a directory along with all its files and subdirectories.

Specify the path of the source directory as the first argument, and the path of the destination directory as the second. The function returns the path of the destination directory.

Consider the following files and directories:

temp/
└── dir1/
    ├── file.txt
    └── subdir/
        └── file2.txt

If you specify a new path (non-existent path) as the second argument, all directories, including intermediate ones, are created, and the contents are copied.

new_path = shutil.copytree('temp/dir1', 'temp/dir2/new_dir')
print(new_path)
# temp/dir2/new_dir
temp/
├── dir1/
│   ├── file.txt
│   └── subdir/
│       └── file2.txt
└── dir2/
    └── new_dir/
        ├── file.txt
        └── subdir/
            └── file2.txt

By default, if you specify an existing directory as the destination, an error will occur. To avoid this, use the dirs_exist_ok argument described next.

# new_path = shutil.copytree('temp/dir1', 'temp/dir2/new_dir')
# FileExistsError: [Errno 17] File exists: 'temp/dir2/new_dir'

Copy to an existing directory: dirs_exist_ok

As mentioned above, by default, specifying an existing directory as the second argument raises an error.

However, by setting the dirs_exist_ok argument to True, this error will be prevented. If the destination directory contains a file with the same name as one in the source directory, the destination file will be replaced.

new_path = shutil.copytree('temp/dir1', 'temp/dir2/new_dir', dirs_exist_ok=True)
print(new_path)
# temp/dir2/new_dir

Specify the copy function: copy_function

You can specify the function used for copying with the copy_function argument.

The default is shutil.copy2(), which tries to preserve as much metadata as possible. If you prefer not to duplicate the metadata, you can specify shutil.copy() instead.

import os

new_path = shutil.copytree('temp/dir1', 'temp/dir_copy2')
print(new_path)
# temp/dir_copy2

print(os.path.getmtime('temp/dir1/file.txt') == os.path.getmtime('temp/dir_copy2/file.txt'))
# True

new_path = shutil.copytree('temp/dir1', 'temp/dir_copy', copy_function=shutil.copy)
print(new_path)
# temp/dir_copy

print(os.path.getmtime('temp/dir1/file.txt') == os.path.getmtime('temp/dir_copy/file.txt'))
# False

Specify files and directories to ignore: ignore

By specifying shutil.ignore_patterns() to the ignore argument, you can exclude certain files or directories from the copying process.

You can specify multiple glob-style patterns in shutil.ignore_patterns(). You can use wildcards such as *. Files and directories whose names match the provided patterns will not be copied.

Consider the following files and directories:

temp/
└── dir1/
    ├── .config
    ├── .dir/
    ├── file.txt
    └── subdir/
        ├── file.jpg
        └── file.txt

Here, files and directories starting with . and files with the jpg extension are ignored.

new_path = shutil.copytree(
    'temp/dir1', 'temp/dir2', ignore=shutil.ignore_patterns('.*', '*.jpg')
)
print(new_path)
# temp/dir2
temp/
├── dir1/
│   ├── .config
│   ├── .dir/
│   ├── file.txt
│   └── subdir/
│       ├── file.jpg
│       └── file.txt
└── dir2/
    ├── file.txt
    └── subdir/
        └── file.txt

Copy multiple files based on certain conditions with wildcards and regex

For a practical example, let's examine how to copy multiple files based on certain conditions.

Please note that using ** in glob.glob() can be time-consuming with a large number of files and directories. For improved efficiency, consider specifying conditions with other special characters when feasible.

If possible, using the ignore argument of shutil.copytree() to specify conditions can simplify the process.

Use wildcards to specify conditions

The glob module allows you to use wildcard characters like * to generate a list of file and directory names. For detailed usage, refer to the following article.

Consider the following files and directories:

temp/
└── dir1/
    ├── 123.jpg
    ├── 456.txt
    ├── abc.txt
    └── subdir/
        ├── 000.csv
        ├── 789.txt
        └── xyz.jpg

Without preserving the original directory structure

For example, you can extract and copy files with the txt extension, including those in subdirectories. The destination directory must be created beforehand.

import shutil
import glob
import os

os.makedirs('temp/dir2')

for p in glob.glob('temp/dir1/**/*.txt', recursive=True):
    if os.path.isfile(p):
        shutil.copy(p, 'temp/dir2')
temp/
├── dir1/
│   ├── 123.jpg
│   ├── 456.txt
│   ├── abc.txt
│   └── subdir/
│       ├── 000.csv
│       ├── 789.txt
│       └── xyz.jpg
└── dir2/
    ├── 456.txt
    ├── 789.txt
    └── abc.txt

In the example above, there's no problem. However, under different conditions, directories might be included in the results. Therefore, os.path.isfile() is used to target only files.

If you want to copy directories, use os.path.isdir() and shutil.copytree() instead of os.path.isfile() and shutil.copy().

With preserving the original directory structure

If you want to maintain the directory structure of the source, you can do as follows.

Specify the root_dir argument in glob.glob(), use os.path.dirname() to extract the directory name, and use os.path.join() to concatenate paths. You don't need to create the destination directory in advance because it's generated with os.makedirs().

src_dir = 'temp/dir1'
dst_dir = 'temp/dir3'

for p in glob.glob('**/*.txt', recursive=True, root_dir=src_dir):
    if os.path.isfile(os.path.join(src_dir, p)):
        os.makedirs(os.path.join(dst_dir, os.path.dirname(p)), exist_ok=True)
        shutil.copy(os.path.join(src_dir, p), os.path.join(dst_dir, p))
temp/
├── dir1/
│   ├── 123.jpg
│   ├── 456.txt
│   ├── abc.txt
│   └── subdir/
│       ├── 000.csv
│       ├── 789.txt
│       └── xyz.jpg
├── dir2/
│   ├── 456.txt
│   ├── 789.txt
│   └── abc.txt
└── dir3/
    ├── 456.txt
    ├── abc.txt
    └── subdir/
        └── 789.txt

Use regex to specify conditions

For complex conditions that can't be handled by glob alone, use the re module to specify conditions with regular expressions.

The basic approach remains the same as when using glob.glob() alone.

Consider the following files and directories:

temp/
└── dir1/
    ├── 123.jpg
    ├── 456.txt
    ├── abc.txt
    └── subdir/
        ├── 000.csv
        ├── 789.txt
        └── xyz.jpg

Without preserving the original directory structure

Use glob.glob() with ** and recursive=True to recursively list all files and directories, and then filter them by regex with re.search().

For example, to extract and copy files with names consisting only of digits and either a txt or csv extension, including files in subdirectories, use \d to match digits, + to match one or more repetitions, and (a|b) to match either a or b.

import shutil
import glob
import os
import re

os.makedirs('temp/dir2')

for p in glob.glob('temp/dir1/**', recursive=True):
    if os.path.isfile(p) and re.search('\d+\.(txt|csv)', p):
        shutil.copy(p, 'temp/dir2')
temp/
├── dir1/
│   ├── 123.jpg
│   ├── 456.txt
│   ├── abc.txt
│   └── subdir/
│       ├── 000.csv
│       ├── 789.txt
│       └── xyz.jpg
└── dir2/
    ├── 000.csv
    ├── 456.txt
    └── 789.txt

With preserving the original directory structure

If you want to retain the directory structure of the source, do as follows.

src_dir = 'temp/dir1'
dst_dir = 'temp/dir3'

for p in glob.glob('**', recursive=True, root_dir=src_dir):
    if os.path.isfile(os.path.join(src_dir, p)) and re.search('\d+\.(txt|csv)', p):
        os.makedirs(os.path.join(dst_dir, os.path.dirname(p)), exist_ok=True)
        shutil.copy(os.path.join(src_dir, p), os.path.join(dst_dir, p))
temp/
├── dir1/
│   ├── 123.jpg
│   ├── 456.txt
│   ├── abc.txt
│   └── subdir/
│       ├── 000.csv
│       ├── 789.txt
│       └── xyz.jpg
├── dir2/
│   ├── 000.csv
│   ├── 456.txt
│   └── 789.txt
└── dir3/
    ├── 456.txt
    └── subdir/
        ├── 000.csv
        └── 789.txt

Related Categories

Related Articles