pandas: Select columns by dtype with select_dtypes()

Modified: 2024-01-24 | Tags: Python, pandas

In pandas, each column of a DataFrame has a specific data type (dtype). To select columns based on their data types, use the select_dtypes() method. For example, you can extract only numerical columns.

pandas.DataFrame.select_dtypes — pandas 2.1.4 documentation

Contents

Basic usage of select_dtypes()
- Specify data types to include: include
- Specify data types to exclude: exclude
How to specify data types in select_dtypes()
Caution when specifying string columns in select_dtypes()

For more details on data types (dtype) in pandas, see the following article.

pandas: How to use astype() to cast dtype of DataFrame

To extract columns based on conditions for column names rather than data types, see the following article.

pandas: Filter rows/columns by labels with filter()

The pandas and NumPy versions used in this article are as follows. Note that functionality may vary between versions. The following DataFrame is used as an example.

import pandas as pd
import numpy as np

print(pd.__version__)
# 2.1.4

print(np.__version__)
# 1.26.2

df = pd.DataFrame({'a': [1, 2, 3],
                   'b': np.array([10, 20, 30], dtype=np.int32),
                   'c': [0.1, 0.2, 0.3],
                   'd': ['X', 'Y', 'Z'],
                   'e': [[0, 0], [1, 1], [2, 2]],
                   'f': [True, True, False],
                   'g': pd.to_datetime(['2023-12-01', '2023-12-02', '2023-12-03'])})
print(df)
#    a   b    c  d       e      f          g
# 0  1  10  0.1  X  [0, 0]   True 2023-12-01
# 1  2  20  0.2  Y  [1, 1]   True 2023-12-02
# 2  3  30  0.3  Z  [2, 2]  False 2023-12-03

print(df.dtypes)
# a             int64
# b             int32
# c           float64
# d            object
# e            object
# f              bool
# g    datetime64[ns]
# dtype: object

source: pandas_select_dtypes.py

Basic usage of `select_dtypes()`

Specify data types to include: `include`

Use the include argument to specify the data types to include. You can specify data types with type objects or strings. Details are explained later.

print(df.select_dtypes(include=int))
#    a   b
# 0  1  10
# 1  2  20
# 2  3  30

source: pandas_select_dtypes.py

You can specify multiple data types in a list.

print(df.select_dtypes(include=['int32', bool]))
#     b      f
# 0  10   True
# 1  20   True
# 2  30  False

source: pandas_select_dtypes.py

If a column of the specified data type does not exist, an empty DataFrame is returned.

pandas: Check if DataFrame/Series is empty

print(df.select_dtypes(include='float32'))
# Empty DataFrame
# Columns: []
# Index: [0, 1, 2]

source: pandas_select_dtypes.py

You can use 'number' to extract only numeric columns.

print(df.select_dtypes(include='number'))
#    a   b    c
# 0  1  10  0.1
# 1  2  20  0.2
# 2  3  30  0.3

source: pandas_select_dtypes.py

Specify data types to exclude: `exclude`

Use the exclude argument to specify the data types to exclude. Multiple data types can be specified in a list.

print(df.select_dtypes(exclude=int))
#      c  d       e      f          g
# 0  0.1  X  [0, 0]   True 2023-12-01
# 1  0.2  Y  [1, 1]   True 2023-12-02
# 2  0.3  Z  [2, 2]  False 2023-12-03

print(df.select_dtypes(exclude=['int32', bool]))
#    a    c  d       e          g
# 0  1  0.1  X  [0, 0] 2023-12-01
# 1  2  0.2  Y  [1, 1] 2023-12-02
# 2  3  0.3  Z  [2, 2] 2023-12-03

source: pandas_select_dtypes.py

include and exclude can be specified at the same time, but specifying the same type will result in an error.

print(df.select_dtypes(include='number', exclude='int32'))
#    a    c
# 0  1  0.1
# 1  2  0.2
# 2  3  0.3

# print(df.select_dtypes(include=['int32', bool], exclude='int32'))
# ValueError: include and exclude overlap on frozenset({<class 'numpy.int32'>})

source: pandas_select_dtypes.py

How to specify data types in `select_dtypes()`

In select_dtypes(), data types can be specified with type objects such as int or np.int64, or with type name/type code strings like 'int64' or 'i8'.

pandas: How to use astype() to cast dtype of DataFrame

print(df.select_dtypes(include=['i8', 'int32', np.float64]))
#    a   b    c
# 0  1  10  0.1
# 1  2  20  0.2
# 2  3  30  0.3

source: pandas_select_dtypes.py

In addition, data types can be specified as follows. 'number' is useful for specifying all numeric types at once.

To select all numeric types, use np.number or 'number'

To select datetimes, use np.datetime64, 'datetime' or 'datetime64'

To select timedeltas, use np.timedelta64, 'timedelta' or 'timedelta64'

To select Pandas categorical dtypes, use 'category'

To select Pandas datetimetz dtypes, use 'datetimetz' or 'datetime64[ns, tz]' pandas.DataFrame.select_dtypes — pandas 2.1.4 documentation

print(df.select_dtypes(include=['number', 'datetime']))
#    a   b    c          g
# 0  1  10  0.1 2023-12-01
# 1  2  20  0.2 2023-12-02
# 2  3  30  0.3 2023-12-03

source: pandas_select_dtypes.py

Caution when specifying string columns in `select_dtypes()`

Since the data type of columns containing strings (str) is object, specifying str or 'str' in select_dtypes() will cause an error.

# print(df.select_dtypes(include=str))
# TypeError: string dtypes are not allowed, use 'object' instead

source: pandas_select_dtypes.py

Columns containing not only str but also other Python built-in types such as list or dict are categorized as object type. Note that if object is specified in select_dtypes(), these columns will also be selected.

print(df.select_dtypes(include=object))
#    d       e
# 0  X  [0, 0]
# 1  Y  [1, 1]
# 2  Z  [2, 2]

print(type(df.at[0, 'd']))
# <class 'str'>

print(type(df.at[0, 'e']))
# <class 'list'>

source: pandas_select_dtypes.py

To exclusively extract columns with str elements, apply the built-in type() function to each element in a row and check for a match with str. The resulting boolean index can then be used in loc[] for column selection.

print(df.iloc[0].map(type) == str)
# a    False
# b    False
# c    False
# d     True
# e    False
# f    False
# g    False
# Name: 0, dtype: bool

print(df.loc[:, df.iloc[0].map(type) == str])
#    d
# 0  X
# 1  Y
# 2  Z

source: pandas_select_dtypes.py

pandas: Select columns by dtype with select_dtypes()

Basic usage of `select_dtypes()`

Specify data types to include: `include`

Specify data types to exclude: `exclude`

How to specify data types in `select_dtypes()`

Caution when specifying string columns in `select_dtypes()`

Related Categories

Related Articles

pandas: Select columns by dtype with select_dtypes()

Basic usage of select_dtypes()

Specify data types to include: include

Specify data types to exclude: exclude

How to specify data types in select_dtypes()

Caution when specifying string columns in select_dtypes()

Related Categories

Related Articles

Basic usage of `select_dtypes()`

Specify data types to include: `include`

Specify data types to exclude: `exclude`

How to specify data types in `select_dtypes()`

Caution when specifying string columns in `select_dtypes()`