note.nkmk.me

pandas: Delete rows, columns from DataFrame with drop()

Posted: 2019-07-12 / Tags: Python, pandas

Use drop() to delete rows and columns from pandas.DataFrame.

Before version 0.21.0, specify row / column with parameter labels and axis. index or columns can be used from 0.21.0.

Here, the following contents will be described.

  • Delete rows from DataFrame
    • Specify by row name (row label)
    • Specify by row number
    • Notes when index is not set
  • Delete columns from DataFrame
    • Specify by column name (column label)
    • Specify by column number
  • Delete multiple rows and columns at once

The sample code uses the following data.

import pandas as pd

df = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0)
print(df)
#          age state  point
# name                     
# Alice     24    NY     64
# Bob       42    CA     92
# Charlie   18    CA     70
# Dave      68    TX     70
# Ellen     24    CA     88
# Frank     30    NY     57

CSV file is here:

Sponsored Link

Delete rows from DataFrame

Specify by row name (row label)

Specifying with the first parameter labels and the second parameter axis. In the case of rows, set axis=0.

print(df.drop('Charlie', axis=0))
#        age state  point
# name                   
# Alice   24    NY     64
# Bob     42    CA     92
# Dave    68    TX     70
# Ellen   24    CA     88
# Frank   30    NY     57

The default is axis=0, so axis can be omitted.

print(df.drop('Charlie'))
#        age state  point
# name                   
# Alice   24    NY     64
# Bob     42    CA     92
# Dave    68    TX     70
# Ellen   24    CA     88
# Frank   30    NY     57

From version 0.21.0, you can also use the parameter index.

print(df.drop(index='Charlie'))
#        age state  point
# name                   
# Alice   24    NY     64
# Bob     42    CA     92
# Dave    68    TX     70
# Ellen   24    CA     88
# Frank   30    NY     57

Use a list to delete multiple rows at once.

print(df.drop(['Bob', 'Dave', 'Frank']))
#          age state  point
# name                     
# Alice     24    NY     64
# Charlie   18    CA     70
# Ellen     24    CA     88

print(df.drop(index=['Bob', 'Dave', 'Frank']))
#          age state  point
# name                     
# Alice     24    NY     64
# Charlie   18    CA     70
# Ellen     24    CA     88

By default the original DataFrame is not changed, and a new DataFrame is returned.

Setting the parameter inplace to True changes the original DataFrame. In this case, no new DataFrame is returned, and the return value is None.

df_org = df.copy()
df_org.drop(index=['Bob', 'Dave', 'Frank'], inplace=True)
print(df_org)
#          age state  point
# name                     
# Alice     24    NY     64
# Charlie   18    CA     70
# Ellen     24    CA     88

Specify by row number

If you want to specify by row number, use the index attribute of DataFrame.

Specify the row number in [] of index attribute to get the corresponding row name. Multiple line numbers can be specified using a list.

print(df.index[[1, 3, 5]])
# Index(['Bob', 'Dave', 'Frank'], dtype='object', name='name')

You can specify this as the first parameter labels or index of drop().

print(df.drop(df.index[[1, 3, 5]]))
#          age state  point
# name                     
# Alice     24    NY     64
# Charlie   18    CA     70
# Ellen     24    CA     88

print(df.drop(index=df.index[[1, 3, 5]]))
#          age state  point
# name                     
# Alice     24    NY     64
# Charlie   18    CA     70
# Ellen     24    CA     88

Notes when index is not set

If no row name is set, by default index will be a sequence of integers. Be careful if index is a number rather than a string.

df_noindex = pd.read_csv('data/src/sample_pandas_normal.csv')
print(df_noindex)
#       name  age state  point
# 0    Alice   24    NY     64
# 1      Bob   42    CA     92
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57

print(df_noindex.index)
# RangeIndex(start=0, stop=6, step=1)

As long as it is a sequential number, the result is the same whether you specify a number as it is or use the index attribute.

print(df_noindex.drop([1, 3, 5]))
#       name  age state  point
# 0    Alice   24    NY     64
# 2  Charlie   18    CA     70
# 4    Ellen   24    CA     88

print(df_noindex.drop(df_noindex.index[[1, 3, 5]]))
#       name  age state  point
# 0    Alice   24    NY     64
# 2  Charlie   18    CA     70
# 4    Ellen   24    CA     88

The result is different if it is out of sequence by sorting etc. When specifying a numerical value as it is, the row whose label is the numerical value is deleted, and when using the index attribute, the row whose number is the numerical value is deleted.

df_noindex_sort = df_noindex.sort_values('state')
print(df_noindex_sort)
#       name  age state  point
# 1      Bob   42    CA     92
# 2  Charlie   18    CA     70
# 4    Ellen   24    CA     88
# 0    Alice   24    NY     64
# 5    Frank   30    NY     57
# 3     Dave   68    TX     70

print(df_noindex_sort.index)
# Int64Index([1, 2, 4, 0, 5, 3], dtype='int64')

print(df_noindex_sort.drop([1, 3, 5]))
#       name  age state  point
# 2  Charlie   18    CA     70
# 4    Ellen   24    CA     88
# 0    Alice   24    NY     64

print(df_noindex_sort.drop(df_noindex_sort.index[[1, 3, 5]]))
#     name  age state  point
# 1    Bob   42    CA     92
# 4  Ellen   24    CA     88
# 5  Frank   30    NY     57

See the following post for sorting.

Delete columns from DataFrame

Specify by column name (column label)

Specifying with the first parameter labels and the second parameter axis. In the case of rows, set axis=1.

print(df.drop('state', axis=1))
#          age  point
# name               
# Alice     24     64
# Bob       42     92
# Charlie   18     70
# Dave      68     70
# Ellen     24     88
# Frank     30     57

From version 0.21.0, you can also use the parameter columns.

print(df.drop(columns='state'))
#          age  point
# name               
# Alice     24     64
# Bob       42     92
# Charlie   18     70
# Dave      68     70
# Ellen     24     88
# Frank     30     57

Use a list to delete multiple columns at once.

print(df.drop(['state', 'point'], axis=1))
#          age
# name        
# Alice     24
# Bob       42
# Charlie   18
# Dave      68
# Ellen     24
# Frank     30

print(df.drop(columns=['state', 'point']))
#          age
# name        
# Alice     24
# Bob       42
# Charlie   18
# Dave      68
# Ellen     24
# Frank     30

The parameter inplace can be used as well as for rows.

df_org = df.copy()
df_org.drop(columns=['state', 'point'], inplace=True)
print(df_org)
#          age
# name        
# Alice     24
# Bob       42
# Charlie   18
# Dave      68
# Ellen     24
# Frank     30

Specify by column number

If you want to specify by column number, use the columns attribute of DataFrame.

print(df.columns[[1, 2]])
# Index(['state', 'point'], dtype='object')

print(df.drop(df.columns[[1, 2]], axis=1))
#          age
# name        
# Alice     24
# Bob       42
# Charlie   18
# Dave      68
# Ellen     24
# Frank     30

print(df.drop(columns=df.columns[[1, 2]]))
#          age
# name        
# Alice     24
# Bob       42
# Charlie   18
# Dave      68
# Ellen     24
# Frank     30

If the value of columns is an integer, be careful as described above for rows.

Sponsored Link

Delete multiple rows and columns at once

From version 0.21.0 and later, it is possible to delete multiple rows and multiple columns simultaneously by specifying the parameterindex and columns.

Of course, it is also possible to specify by row number and column number, or to specify the parameter inplace.

print(df.drop(index=['Bob', 'Dave', 'Frank'],
              columns=['state', 'point']))
#          age
# name        
# Alice     24
# Charlie   18
# Ellen     24

print(df.drop(index=df.index[[1, 3, 5]],
              columns=df.columns[[1, 2]]))
#          age
# name        
# Alice     24
# Charlie   18
# Ellen     24

df_org = df.copy()
df_org.drop(index=['Bob', 'Dave', 'Frank'],
            columns=['state', 'point'], inplace=True)
print(df_org)
#          age
# name        
# Alice     24
# Charlie   18
# Ellen     24
Sponsored Link
Share

Related Categories

Related Posts