note.nkmk.me

Extract the file, dir, extension name from a path string in Python

Posted: 2020-12-04 / Tags: Python, File, String

In Python, to extract the file name (base name), directory name (folder name), extension from the path string, or to join the strings to generate the path string, use the os.path module of the standard library.

This article describes the following contents.

  • Difference in path separator by OS
  • Extract the file name (base name): os.path.basename()
    • File name with extension
    • File name without extension
  • Extract the directory name (folder name): os.path.dirname()
  • Get a file / dir name pair: os.path.split()
  • Notes on when the path string indicates a directory
  • Extract the extension: os.path.splitext()
    • Create a path string with a different extension
    • Get the extension without dot (period)
    • Examples of cases like .tar.gz
  • Create a path string by combining the file and directory names: os.path.join()
    • Create a path string for another file in the same directory
  • Use different OS formats
  • Examples for Windows
    • Backslash and raw string
    • Examples of extracting file name, folder name, extension
    • Extract and join a drive letter: os.path.splitdrive()

Take the following path string as an example.

import os

filepath = './dir/subdir/filename.ext'

The sample code below is running on a Mac. Examples for Windows are shown at the end.

In Python 3.4 or later, you can also extract file names, folder names, extensions, etc. using the pathlib module that can operate paths as objects.

Sponsored Link

Difference in path separator by OS

The path separator depends on the OS.

UNIX (including Mac) uses the slash /, and Windows uses the backslash \ as the separator.

The separator in the OS running Python can be obtained and confirmed by os.sep or os.path.sep.

print(os.sep)
# /

print(os.sep is os.path.sep)
# True

Extract the file name (base name): os.path.basename()

Use os.path.basename() to extract the file name from the path string.

File name with extension

os.path.basename() returns the string of the file name (base name) including the extension.

filepath = './dir/subdir/filename.ext'
basename = os.path.basename(filepath)
print(basename)
# filename.ext

print(type(basename))
# <class 'str'>

File name without extension

To extract the file name without the extension, use os.path.splitext() described later.

basename_without_ext = os.path.splitext(os.path.basename(filepath))[0]
print(basename_without_ext)
# filename

os.path.splitext() split at the last (right) dot .. If you want to split by the first (left) dot ., use split().

filepath_tar_gz = './dir/subdir/filename.tar.gz'

print(os.path.splitext(os.path.basename(filepath_tar_gz))[0])
# filename.tar

print(os.path.basename(filepath_tar_gz).split('.', 1)[0])
# filename

Extract the directory name (folder name): os.path.dirname()

Use os.path.dirname() to extract the directory name (folder name) from the path string.

filepath = './dir/subdir/filename.ext'
dirname = os.path.dirname(filepath)
print(dirname)
# ./dir/subdir

print(type(dirname))
# <class 'str'>

If you want to get only the directory name directly above the file, use os.path.basename().

subdirname = os.path.basename(os.path.dirname(filepath))
print(subdirname)
# subdir

Get a file / dir name pair: os.path.split()

Use os.path.split() to get both the file name and the directory name (folder name).

os.path.split() returns a tuple of file name returned by os.path.basename() and directory name returned by os.path.dirname().

filepath = './dir/subdir/filename.ext'
base_dir_pair = os.path.split(filepath)
print(base_dir_pair)
# ('./dir/subdir', 'filename.ext')

print(type(base_dir_pair))
# <class 'tuple'>

print(os.path.split(filepath)[0] == os.path.dirname(filepath))
# True

print(os.path.split(filepath)[1] == os.path.basename(filepath))
# True

You can use tuple unpacking to assign to each variable.

dirname, basename = os.path.split(filepath)
print(dirname)
# ./dir/subdir

print(basename)
# filename.ext

Use os.path.join() described below to rejoin the file and directory names.

Notes on when the path string indicates a directory

Note that if the path string indicates a folder, the result will be different depending on whether there is a separator at the end.

No separator at the end:

dirpath_without_sep = './dir/subdir'
print(os.path.split(dirpath_without_sep))
# ('./dir', 'subdir')

print(os.path.basename(dirpath_without_sep))
# subdir

If there is a separator at the end, use os.path.dirname() and os.path.basename() to get the bottom folder name.

dirpath_with_sep = './dir/subdir/'
print(os.path.split(dirpath_with_sep))
# ('./dir/subdir', '')

print(os.path.basename(os.path.dirname(dirpath_with_sep)))
# subdir
Sponsored Link

Extract the extension: os.path.splitext()

Use os.path.splitext() to get the extension.

os.path.splitext() splits the extension and others (root) and returns it as a tuple. The extension contains the dot ..

filepath = './dir/subdir/filename.ext'
root_ext_pair = os.path.splitext(filepath)
print(root_ext_pair)
# ('./dir/subdir/filename', '.ext')

print(type(root_ext_pair))
# <class 'tuple'>

Concatenating with the + operator returns the original path string.

root, ext = os.path.splitext(filepath)
print(root)
# ./dir/subdir/filename

print(ext)
# .ext

path = root + ext
print(path)
# ./dir/subdir/filename.ext

Create a path string with a different extension

To create a path string with only the extension changed from the original, concatenate the first element of the tuple returned by os.path.splitext() with any extension.

other_ext_filepath = os.path.splitext(filepath)[0] + '.jpg'
print(other_ext_filepath)
# ./dir/subdir/filename.jpg

Get the extension without dot (period)

If you want to get the extension without the dot (period) ., specify the second and subsequent strings with slice [1:].

ext_without_dot = os.path.splitext(filepath)[1][1:]
print(ext_without_dot)
# ext

Examples of cases like .tar.gz

As shown in the example above, os.path.splitext() split at the last (right) dot .. Be careful with extensions like .tar.gz.

filepath_tar_gz = './dir/subdir/filename.tar.gz'
print(os.path.splitext(filepath_tar_gz))
# ('./dir/subdir/filename.tar', '.gz')

If you want to split by the first (left) dot . in the file name, use the split() method of the string, but it doesn't work if the directory name also contains the dot..

print(filepath_tar_gz.split('.', 1))
# ['', '/dir/subdir/filename.tar.gz']

After splitting with os.path.split(), apply the split() method of the string and join with os.path.join() described later.

The string returned by split() does not contain a delimiter, so be careful if you want to get an extension with a dot . like os.path.splitext().

dirname, basename = os.path.split(filepath_tar_gz)
basename_without_ext, ext = basename.split('.', 1)
path_without_ext = os.path.join(dirname, basename_without_ext)
print(path_without_ext)
# ./dir/subdir/filename

print(ext)
# tar.gz

ext_with_dot = '.' + ext
print(ext_with_dot)
# .tar.gz

Create a path string by combining the file and directory names: os.path.join()

Use os.path.join() to join file and directory names to create a new path string.

path = os.path.join('dir', 'subdir', 'filename.ext')
print(path)
# dir/subdir/filename.ext

Create a path string for another file in the same directory

If you want to create a path string for another file in the same folder of one file, use os.path.dirname() and os.path.join().

filepath = './dir/subdir/filename.ext'
other_filepath = os.path.join(os.path.dirname(filepath), 'other_file.ext')
print(other_filepath)
# ./dir/subdir/other_file.ext

Use different OS formats

If you want to manipulate the path string in an OS format that is not the OS on which Python is currently running, import and use different modules instead of the os module.

  • UNIX (including current Mac): posixpath
  • Windows: ntpath
  • Macintosh 9 and earlier: macpath

Since each module has the same interface as os.path, you can change the os.path part of the sample code so far to their module names (such as ntpath).

Examples for Windows

From here, an example of the operation of the path string in the case of Windows is shown.

The sample code below is running on Mac using the ntpath module mentioned above. When running on Windows, you can replace ntpath with os.path.

Backslash and raw string

The path separator in Windows is the backslash \.

To write a backslash in a string, you need to write two backslashes to escape. print() outputs one backslash.

import ntpath

print(ntpath.sep)
# \

print('\\')
# \

print(ntpath.sep is '\\')
# True
source: os_ntpath.py

The raw string (r'xxx') makes it easier to write a Windows path because you can write a backslash as it is. A raw string and a normal string are equal in value.

file_path = 'c:\\dir\\subdir\\filename.ext'
file_path_raw = r'c:\dir\subdir\filename.ext'

print(file_path == file_path_raw)
# True
source: os_ntpath.py

For more information about raw strings, see the following article.

Examples of extracting file name, folder name, extension

It works on Windows as well.

print(ntpath.basename(file_path))
# filename.ext

print(ntpath.dirname(file_path))
# c:\dir\subdir

print(ntpath.split(file_path))
# ('c:\\dir\\subdir', 'filename.ext')
source: os_ntpath.py

Extract and join a drive letter: os.path.splitdrive()

Use os.path.splitdrive() to get the drive letter. The sample code below uses ntpath.splitdrive ().

os.path.splitdrive() splits the drive letter including the colon : and others.

print(ntpath.splitdrive(file_path))
# ('c:', '\\dir\\subdir\\filename.ext')
source: os_ntpath.py

If you want to get only the drive letter, select the first character.

drive_letter = ntpath.splitdrive(file_path)[0][0]

print(drive_letter)
# c
source: os_ntpath.py

Be careful when joining drive characters.

If you pass it to os.path.join() as it is, it will not work.

print(ntpath.join('c:', 'dir', 'subdir', 'filename.ext'))
# c:dir\subdir\filename.ext
source: os_ntpath.py

You can also specify os.sep (ntpath.sep in the sample code) in the argument of os.path.join(), or add a separator to the drive letter.

print(ntpath.join('c:', ntpath.sep, 'dir', 'subdir', 'filename.ext'))
# c:\dir\subdir\filename.ext

print(ntpath.join('c:\\', 'dir', 'subdir', 'filename.ext'))
# c:\dir\subdir\filename.ext
source: os_ntpath.py
Sponsored Link
Share

Related Categories

Related Articles