pandas: Delete rows/columns from DataFrame with drop()
The drop()
method allows you to delete rows and columns from pandas.DataFrame
.
See the following articles about removing missing values (NaN
) and rows with duplicate elements.
- pandas: Remove NaN (missing values) with dropna()
- pandas: Find and remove duplicate rows of DataFrame, Series
The sample code in this article is based on pandas version 2.0.3
. The following pandas.DataFrame
is used as an example.
import pandas as pd
print(pd.__version__)
# 2.0.3
df = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0)
print(df)
# age state point
# name
# Alice 24 NY 64
# Bob 42 CA 92
# Charlie 18 CA 70
# Dave 68 TX 70
# Ellen 24 CA 88
# Frank 30 NY 57
Delete rows from pandas.DataFrame
Specify by row name (label)
When using the drop()
method to delete a row, specify the row name for the first argument labels
and set the axis
argument to 0
. The default for axis
is 0
, so it can be omitted.
print(df.drop('Charlie', axis=0))
# age state point
# name
# Alice 24 NY 64
# Bob 42 CA 92
# Dave 68 TX 70
# Ellen 24 CA 88
# Frank 30 NY 57
print(df.drop('Charlie'))
# age state point
# name
# Alice 24 NY 64
# Bob 42 CA 92
# Dave 68 TX 70
# Ellen 24 CA 88
# Frank 30 NY 57
Starting from version 0.21.0
, the index
argument is also available.
print(df.drop(index='Charlie'))
# age state point
# name
# Alice 24 NY 64
# Bob 42 CA 92
# Dave 68 TX 70
# Ellen 24 CA 88
# Frank 30 NY 57
Use a list to delete multiple rows at once.
print(df.drop(['Bob', 'Dave', 'Frank']))
# age state point
# name
# Alice 24 NY 64
# Charlie 18 CA 70
# Ellen 24 CA 88
print(df.drop(index=['Bob', 'Dave', 'Frank']))
# age state point
# name
# Alice 24 NY 64
# Charlie 18 CA 70
# Ellen 24 CA 88
By default, the original DataFrame
remains unchanged, and a new DataFrame
is returned.
By setting the inplace
argument to True
, you can modify the original DataFrame
directly, and no new DataFrame
is returned; instead, it returns None
.
df_copy = df.copy()
df_copy.drop(index=['Bob', 'Dave', 'Frank'], inplace=True)
print(df_copy)
# age state point
# name
# Alice 24 NY 64
# Charlie 18 CA 70
# Ellen 24 CA 88
Specify by row number
To specify by row number, use the index
attribute of DataFrame
.
Use the index
attribute with []
to get the row name based on its number. To specify multiple rows, use a list.
print(df.index[[1, 3, 5]])
# Index(['Bob', 'Dave', 'Frank'], dtype='object', name='name')
You can use this for the first argument labels
or the index
argument of the drop()
method.
print(df.drop(df.index[[1, 3, 5]]))
# age state point
# name
# Alice 24 NY 64
# Charlie 18 CA 70
# Ellen 24 CA 88
print(df.drop(index=df.index[[1, 3, 5]]))
# age state point
# name
# Alice 24 NY 64
# Charlie 18 CA 70
# Ellen 24 CA 88
Notes on when the index is not set
If the index
is not set, it defaults to a sequence of integers. Exercise caution when the index
is numeric rather than a string.
df_noindex = pd.read_csv('data/src/sample_pandas_normal.csv')
print(df_noindex)
# name age state point
# 0 Alice 24 NY 64
# 1 Bob 42 CA 92
# 2 Charlie 18 CA 70
# 3 Dave 68 TX 70
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
print(df_noindex.index)
# RangeIndex(start=0, stop=6, step=1)
When the indices are sequential, directly specifying the number produces the same result as using the index
attribute.
print(df_noindex.drop([1, 3, 5]))
# name age state point
# 0 Alice 24 NY 64
# 2 Charlie 18 CA 70
# 4 Ellen 24 CA 88
print(df_noindex.drop(df_noindex.index[[1, 3, 5]]))
# name age state point
# 0 Alice 24 NY 64
# 2 Charlie 18 CA 70
# 4 Ellen 24 CA 88
The result differs if the sequence is disrupted by actions like sorting. When specifying a number directly, the row with that number as its label is deleted. When using the index
attribute, the row with that number as its position is deleted.
df_noindex_sort = df_noindex.sort_values('state')
print(df_noindex_sort)
# name age state point
# 1 Bob 42 CA 92
# 2 Charlie 18 CA 70
# 4 Ellen 24 CA 88
# 0 Alice 24 NY 64
# 5 Frank 30 NY 57
# 3 Dave 68 TX 70
print(df_noindex_sort.index)
# Index([1, 2, 4, 0, 5, 3], dtype='int64')
print(df_noindex_sort.drop([1, 3, 5]))
# name age state point
# 2 Charlie 18 CA 70
# 4 Ellen 24 CA 88
# 0 Alice 24 NY 64
print(df_noindex_sort.drop(df_noindex_sort.index[[1, 3, 5]]))
# name age state point
# 1 Bob 42 CA 92
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
Refer to the article below for details on sorting.
Delete columns from pandas.DataFrame
Specify by column name (label)
When using the drop()
method to delete a column, specify the column name for the first argument labels
and set the axis
argument to 1
.
print(df.drop('state', axis=1))
# age point
# name
# Alice 24 64
# Bob 42 92
# Charlie 18 70
# Dave 68 70
# Ellen 24 88
# Frank 30 57
Starting from version 0.21.0
, the columns
argument is also available.
print(df.drop(columns='state'))
# age point
# name
# Alice 24 64
# Bob 42 92
# Charlie 18 70
# Dave 68 70
# Ellen 24 88
# Frank 30 57
Use a list to delete multiple columns at once.
print(df.drop(['state', 'point'], axis=1))
# age
# name
# Alice 24
# Bob 42
# Charlie 18
# Dave 68
# Ellen 24
# Frank 30
print(df.drop(columns=['state', 'point']))
# age
# name
# Alice 24
# Bob 42
# Charlie 18
# Dave 68
# Ellen 24
# Frank 30
The inplace
argument can be used as well as for rows.
df_copy = df.copy()
df_copy.drop(columns=['state', 'point'], inplace=True)
print(df_copy)
# age
# name
# Alice 24
# Bob 42
# Charlie 18
# Dave 68
# Ellen 24
# Frank 30
Specify by column number
To specify by column number, use the columns
attribute of DataFrame
.
print(df.columns[[1, 2]])
# Index(['state', 'point'], dtype='object')
print(df.drop(df.columns[[1, 2]], axis=1))
# age
# name
# Alice 24
# Bob 42
# Charlie 18
# Dave 68
# Ellen 24
# Frank 30
print(df.drop(columns=df.columns[[1, 2]]))
# age
# name
# Alice 24
# Bob 42
# Charlie 18
# Dave 68
# Ellen 24
# Frank 30
If the columns
value is an integer, exercise the same caution as mentioned for rows.
Delete multiple rows and columns simultaneously
Starting from version 0.21.0
, you can simultaneously delete multiple rows and columns using both the index
and columns
arguments.
Of course, it is also possible to specify by row number and column number or specify the inplace
argument.
print(df.drop(index=['Bob', 'Dave', 'Frank'], columns=['state', 'point']))
# age
# name
# Alice 24
# Charlie 18
# Ellen 24
print(df.drop(index=df.index[[1, 3, 5]], columns=df.columns[[1, 2]]))
# age
# name
# Alice 24
# Charlie 18
# Ellen 24
df_copy = df.copy()
df_copy.drop(index=['Bob', 'Dave', 'Frank'], columns=['state', 'point'],
inplace=True)
print(df_copy)
# age
# name
# Alice 24
# Charlie 18
# Ellen 24