pandas: Delete rows/columns from DataFrame with drop()

Modified: | Tags: Python, pandas

The drop() method allows you to delete rows and columns from pandas.DataFrame.

See the following articles about removing missing values (NaN) and rows with duplicate elements.

The sample code in this article is based on pandas version 2.0.3. The following pandas.DataFrame is used as an example.

import pandas as pd

print(pd.__version__)
# 2.0.3

df = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0)
print(df)
#          age state  point
# name                     
# Alice     24    NY     64
# Bob       42    CA     92
# Charlie   18    CA     70
# Dave      68    TX     70
# Ellen     24    CA     88
# Frank     30    NY     57

Delete rows from pandas.DataFrame

Specify by row name (label)

When using the drop() method to delete a row, specify the row name for the first argument labels and set the axis argument to 0. The default for axis is 0, so it can be omitted.

print(df.drop('Charlie', axis=0))
#        age state  point
# name                   
# Alice   24    NY     64
# Bob     42    CA     92
# Dave    68    TX     70
# Ellen   24    CA     88
# Frank   30    NY     57

print(df.drop('Charlie'))
#        age state  point
# name                   
# Alice   24    NY     64
# Bob     42    CA     92
# Dave    68    TX     70
# Ellen   24    CA     88
# Frank   30    NY     57

Starting from version 0.21.0, the index argument is also available.

print(df.drop(index='Charlie'))
#        age state  point
# name                   
# Alice   24    NY     64
# Bob     42    CA     92
# Dave    68    TX     70
# Ellen   24    CA     88
# Frank   30    NY     57

Use a list to delete multiple rows at once.

print(df.drop(['Bob', 'Dave', 'Frank']))
#          age state  point
# name                     
# Alice     24    NY     64
# Charlie   18    CA     70
# Ellen     24    CA     88

print(df.drop(index=['Bob', 'Dave', 'Frank']))
#          age state  point
# name                     
# Alice     24    NY     64
# Charlie   18    CA     70
# Ellen     24    CA     88

By default, the original DataFrame remains unchanged, and a new DataFrame is returned.

By setting the inplace argument to True, you can modify the original DataFrame directly, and no new DataFrame is returned; instead, it returns None.

df_copy = df.copy()
df_copy.drop(index=['Bob', 'Dave', 'Frank'], inplace=True)
print(df_copy)
#          age state  point
# name                     
# Alice     24    NY     64
# Charlie   18    CA     70
# Ellen     24    CA     88

Specify by row number

To specify by row number, use the index attribute of DataFrame.

Use the index attribute with [] to get the row name based on its number. To specify multiple rows, use a list.

print(df.index[[1, 3, 5]])
# Index(['Bob', 'Dave', 'Frank'], dtype='object', name='name')

You can use this for the first argument labels or the index argument of the drop() method.

print(df.drop(df.index[[1, 3, 5]]))
#          age state  point
# name                     
# Alice     24    NY     64
# Charlie   18    CA     70
# Ellen     24    CA     88

print(df.drop(index=df.index[[1, 3, 5]]))
#          age state  point
# name                     
# Alice     24    NY     64
# Charlie   18    CA     70
# Ellen     24    CA     88

Notes on when the index is not set

If the index is not set, it defaults to a sequence of integers. Exercise caution when the index is numeric rather than a string.

df_noindex = pd.read_csv('data/src/sample_pandas_normal.csv')
print(df_noindex)
#       name  age state  point
# 0    Alice   24    NY     64
# 1      Bob   42    CA     92
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57

print(df_noindex.index)
# RangeIndex(start=0, stop=6, step=1)

When the indices are sequential, directly specifying the number produces the same result as using the index attribute.

print(df_noindex.drop([1, 3, 5]))
#       name  age state  point
# 0    Alice   24    NY     64
# 2  Charlie   18    CA     70
# 4    Ellen   24    CA     88

print(df_noindex.drop(df_noindex.index[[1, 3, 5]]))
#       name  age state  point
# 0    Alice   24    NY     64
# 2  Charlie   18    CA     70
# 4    Ellen   24    CA     88

The result differs if the sequence is disrupted by actions like sorting. When specifying a number directly, the row with that number as its label is deleted. When using the index attribute, the row with that number as its position is deleted.

df_noindex_sort = df_noindex.sort_values('state')
print(df_noindex_sort)
#       name  age state  point
# 1      Bob   42    CA     92
# 2  Charlie   18    CA     70
# 4    Ellen   24    CA     88
# 0    Alice   24    NY     64
# 5    Frank   30    NY     57
# 3     Dave   68    TX     70

print(df_noindex_sort.index)
# Index([1, 2, 4, 0, 5, 3], dtype='int64')

print(df_noindex_sort.drop([1, 3, 5]))
#       name  age state  point
# 2  Charlie   18    CA     70
# 4    Ellen   24    CA     88
# 0    Alice   24    NY     64

print(df_noindex_sort.drop(df_noindex_sort.index[[1, 3, 5]]))
#     name  age state  point
# 1    Bob   42    CA     92
# 4  Ellen   24    CA     88
# 5  Frank   30    NY     57

Refer to the article below for details on sorting.

Delete columns from pandas.DataFrame

Specify by column name (label)

When using the drop() method to delete a column, specify the column name for the first argument labels and set the axis argument to 1.

print(df.drop('state', axis=1))
#          age  point
# name               
# Alice     24     64
# Bob       42     92
# Charlie   18     70
# Dave      68     70
# Ellen     24     88
# Frank     30     57

Starting from version 0.21.0, the columns argument is also available.

print(df.drop(columns='state'))
#          age  point
# name               
# Alice     24     64
# Bob       42     92
# Charlie   18     70
# Dave      68     70
# Ellen     24     88
# Frank     30     57

Use a list to delete multiple columns at once.

print(df.drop(['state', 'point'], axis=1))
#          age
# name        
# Alice     24
# Bob       42
# Charlie   18
# Dave      68
# Ellen     24
# Frank     30

print(df.drop(columns=['state', 'point']))
#          age
# name        
# Alice     24
# Bob       42
# Charlie   18
# Dave      68
# Ellen     24
# Frank     30

The inplace argument can be used as well as for rows.

df_copy = df.copy()
df_copy.drop(columns=['state', 'point'], inplace=True)
print(df_copy)
#          age
# name        
# Alice     24
# Bob       42
# Charlie   18
# Dave      68
# Ellen     24
# Frank     30

Specify by column number

To specify by column number, use the columns attribute of DataFrame.

print(df.columns[[1, 2]])
# Index(['state', 'point'], dtype='object')

print(df.drop(df.columns[[1, 2]], axis=1))
#          age
# name        
# Alice     24
# Bob       42
# Charlie   18
# Dave      68
# Ellen     24
# Frank     30

print(df.drop(columns=df.columns[[1, 2]]))
#          age
# name        
# Alice     24
# Bob       42
# Charlie   18
# Dave      68
# Ellen     24
# Frank     30

If the columns value is an integer, exercise the same caution as mentioned for rows.

Delete multiple rows and columns simultaneously

Starting from version 0.21.0, you can simultaneously delete multiple rows and columns using both the index and columns arguments.

Of course, it is also possible to specify by row number and column number or specify the inplace argument.

print(df.drop(index=['Bob', 'Dave', 'Frank'], columns=['state', 'point']))
#          age
# name        
# Alice     24
# Charlie   18
# Ellen     24

print(df.drop(index=df.index[[1, 3, 5]], columns=df.columns[[1, 2]]))
#          age
# name        
# Alice     24
# Charlie   18
# Ellen     24

df_copy = df.copy()
df_copy.drop(index=['Bob', 'Dave', 'Frank'], columns=['state', 'point'],
             inplace=True)
print(df_copy)
#          age
# name        
# Alice     24
# Charlie   18
# Ellen     24

Related Categories

Related Articles