pandas: Delete rows/columns from DataFrame with drop()
The drop() method allows you to delete rows and columns from pandas.DataFrame.
See the following articles about removing missing values (NaN) and rows with duplicate elements.
- pandas: Remove NaN (missing values) with dropna()
- pandas: Find and remove duplicate rows of DataFrame, Series
The sample code in this article is based on pandas version 2.0.3. The following pandas.DataFrame is used as an example.
import pandas as pd
print(pd.__version__)
# 2.0.3
df = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0)
print(df)
# age state point
# name
# Alice 24 NY 64
# Bob 42 CA 92
# Charlie 18 CA 70
# Dave 68 TX 70
# Ellen 24 CA 88
# Frank 30 NY 57
Delete rows from pandas.DataFrame
Specify by row name (label)
When using the drop() method to delete a row, specify the row name for the first argument labels and set the axis argument to 0. The default for axis is 0, so it can be omitted.
print(df.drop('Charlie', axis=0))
# age state point
# name
# Alice 24 NY 64
# Bob 42 CA 92
# Dave 68 TX 70
# Ellen 24 CA 88
# Frank 30 NY 57
print(df.drop('Charlie'))
# age state point
# name
# Alice 24 NY 64
# Bob 42 CA 92
# Dave 68 TX 70
# Ellen 24 CA 88
# Frank 30 NY 57
Starting from version 0.21.0, the index argument is also available.
print(df.drop(index='Charlie'))
# age state point
# name
# Alice 24 NY 64
# Bob 42 CA 92
# Dave 68 TX 70
# Ellen 24 CA 88
# Frank 30 NY 57
Use a list to delete multiple rows at once.
print(df.drop(['Bob', 'Dave', 'Frank']))
# age state point
# name
# Alice 24 NY 64
# Charlie 18 CA 70
# Ellen 24 CA 88
print(df.drop(index=['Bob', 'Dave', 'Frank']))
# age state point
# name
# Alice 24 NY 64
# Charlie 18 CA 70
# Ellen 24 CA 88
By default, the original DataFrame remains unchanged, and a new DataFrame is returned.
By setting the inplace argument to True, you can modify the original DataFrame directly, and no new DataFrame is returned; instead, it returns None.
df_copy = df.copy()
df_copy.drop(index=['Bob', 'Dave', 'Frank'], inplace=True)
print(df_copy)
# age state point
# name
# Alice 24 NY 64
# Charlie 18 CA 70
# Ellen 24 CA 88
Specify by row number
To specify by row number, use the index attribute of DataFrame.
Use the index attribute with [] to get the row name based on its number. To specify multiple rows, use a list.
print(df.index[[1, 3, 5]])
# Index(['Bob', 'Dave', 'Frank'], dtype='object', name='name')
You can use this for the first argument labels or the index argument of the drop() method.
print(df.drop(df.index[[1, 3, 5]]))
# age state point
# name
# Alice 24 NY 64
# Charlie 18 CA 70
# Ellen 24 CA 88
print(df.drop(index=df.index[[1, 3, 5]]))
# age state point
# name
# Alice 24 NY 64
# Charlie 18 CA 70
# Ellen 24 CA 88
Notes on when the index is not set
If the index is not set, it defaults to a sequence of integers. Exercise caution when the index is numeric rather than a string.
df_noindex = pd.read_csv('data/src/sample_pandas_normal.csv')
print(df_noindex)
# name age state point
# 0 Alice 24 NY 64
# 1 Bob 42 CA 92
# 2 Charlie 18 CA 70
# 3 Dave 68 TX 70
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
print(df_noindex.index)
# RangeIndex(start=0, stop=6, step=1)
When the indices are sequential, directly specifying the number produces the same result as using the index attribute.
print(df_noindex.drop([1, 3, 5]))
# name age state point
# 0 Alice 24 NY 64
# 2 Charlie 18 CA 70
# 4 Ellen 24 CA 88
print(df_noindex.drop(df_noindex.index[[1, 3, 5]]))
# name age state point
# 0 Alice 24 NY 64
# 2 Charlie 18 CA 70
# 4 Ellen 24 CA 88
The result differs if the sequence is disrupted by actions like sorting. When specifying a number directly, the row with that number as its label is deleted. When using the index attribute, the row with that number as its position is deleted.
df_noindex_sort = df_noindex.sort_values('state')
print(df_noindex_sort)
# name age state point
# 1 Bob 42 CA 92
# 2 Charlie 18 CA 70
# 4 Ellen 24 CA 88
# 0 Alice 24 NY 64
# 5 Frank 30 NY 57
# 3 Dave 68 TX 70
print(df_noindex_sort.index)
# Index([1, 2, 4, 0, 5, 3], dtype='int64')
print(df_noindex_sort.drop([1, 3, 5]))
# name age state point
# 2 Charlie 18 CA 70
# 4 Ellen 24 CA 88
# 0 Alice 24 NY 64
print(df_noindex_sort.drop(df_noindex_sort.index[[1, 3, 5]]))
# name age state point
# 1 Bob 42 CA 92
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
Refer to the article below for details on sorting.
Delete columns from pandas.DataFrame
Specify by column name (label)
When using the drop() method to delete a column, specify the column name for the first argument labels and set the axis argument to 1.
print(df.drop('state', axis=1))
# age point
# name
# Alice 24 64
# Bob 42 92
# Charlie 18 70
# Dave 68 70
# Ellen 24 88
# Frank 30 57
Starting from version 0.21.0, the columns argument is also available.
print(df.drop(columns='state'))
# age point
# name
# Alice 24 64
# Bob 42 92
# Charlie 18 70
# Dave 68 70
# Ellen 24 88
# Frank 30 57
Use a list to delete multiple columns at once.
print(df.drop(['state', 'point'], axis=1))
# age
# name
# Alice 24
# Bob 42
# Charlie 18
# Dave 68
# Ellen 24
# Frank 30
print(df.drop(columns=['state', 'point']))
# age
# name
# Alice 24
# Bob 42
# Charlie 18
# Dave 68
# Ellen 24
# Frank 30
The inplace argument can be used as well as for rows.
df_copy = df.copy()
df_copy.drop(columns=['state', 'point'], inplace=True)
print(df_copy)
# age
# name
# Alice 24
# Bob 42
# Charlie 18
# Dave 68
# Ellen 24
# Frank 30
Specify by column number
To specify by column number, use the columns attribute of DataFrame.
print(df.columns[[1, 2]])
# Index(['state', 'point'], dtype='object')
print(df.drop(df.columns[[1, 2]], axis=1))
# age
# name
# Alice 24
# Bob 42
# Charlie 18
# Dave 68
# Ellen 24
# Frank 30
print(df.drop(columns=df.columns[[1, 2]]))
# age
# name
# Alice 24
# Bob 42
# Charlie 18
# Dave 68
# Ellen 24
# Frank 30
If the columns value is an integer, exercise the same caution as mentioned for rows.
Delete multiple rows and columns simultaneously
Starting from version 0.21.0, you can simultaneously delete multiple rows and columns using both the index and columns arguments.
Of course, it is also possible to specify by row number and column number or specify the inplace argument.
print(df.drop(index=['Bob', 'Dave', 'Frank'], columns=['state', 'point']))
# age
# name
# Alice 24
# Charlie 18
# Ellen 24
print(df.drop(index=df.index[[1, 3, 5]], columns=df.columns[[1, 2]]))
# age
# name
# Alice 24
# Charlie 18
# Ellen 24
df_copy = df.copy()
df_copy.drop(index=['Bob', 'Dave', 'Frank'], columns=['state', 'point'],
inplace=True)
print(df_copy)
# age
# name
# Alice 24
# Charlie 18
# Ellen 24