note.nkmk.me

pandas: Reset index of DataFrame, Series with reset_index()

Posted: 2019-12-05 / Tags: Python, pandas

By using reset_index(), the index (row label) of pandas.DataFrame and pandas.Series can be reassigned to the sequential number (row number) starting from 0.

If row numbers are used as an index, it is more convenient to reindex when the order of the rows changes after sorting or when a missing number after deleting a row.

It is also used to delete the current index or return to the data column when using the row name (string) as an index. By using set_index() and reset_index(), you can change the index to another column.

Here, the following contents will be described.

  • Basic usage of reset_index()
  • Change the index to another column by reset_index() and set_index()

The following data is used as an example.

import pandas as pd

df = pd.read_csv('data/src/sample_pandas_normal.csv')
print(df)
#       name  age state  point
# 0    Alice   24    NY     64
# 1      Bob   42    CA     92
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57

Click here for sample CSV file.

The example uses pandas.DataFrame, but pandas.Series also provides reset_index(). The usage is the same.

Sponsored Link

Basic usage of reset_index()

Sort rows with sort_values() for explanation.

See the following post for details about sorting with sort_values() and sort_index().

df.sort_values('state', inplace=True)
print(df)
#       name  age state  point
# 1      Bob   42    CA     92
# 2  Charlie   18    CA     70
# 4    Ellen   24    CA     88
# 0    Alice   24    NY     64
# 5    Frank   30    NY     57
# 3     Dave   68    TX     70

Re-assign the index to sequential numbers starting from 0 by reset_index().

By default, the original index is added as a new column.

df_r = df.reset_index()
print(df_r)
#    index     name  age state  point
# 0      1      Bob   42    CA     92
# 1      2  Charlie   18    CA     70
# 2      4    Ellen   24    CA     88
# 3      0    Alice   24    NY     64
# 4      5    Frank   30    NY     57
# 5      3     Dave   68    TX     70

Delete the original index: drop

If the parameter drop is set to True, the original index is deleted.

df_r = df.reset_index(drop=True)
print(df_r)
#       name  age state  point
# 0      Bob   42    CA     92
# 1  Charlie   18    CA     70
# 2    Ellen   24    CA     88
# 3    Alice   24    NY     64
# 4    Frank   30    NY     57
# 5     Dave   68    TX     70

Change original object: inplace

By default, reset_index() does not change the original object and returns a new object, but if the argument inplace is set to True, the original object is changed.

df.reset_index(inplace=True, drop=True)
print(df)
#       name  age state  point
# 0      Bob   42    CA     92
# 1  Charlie   18    CA     70
# 2    Ellen   24    CA     88
# 3    Alice   24    NY     64
# 4    Frank   30    NY     57
# 5     Dave   68    TX     70

Change the index to another column by reset_index() and set_index()

Take as an example the case where a row name (string) is set as an index.

df = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0)
print(df)
#          age state  point
# name                     
# Alice     24    NY     64
# Bob       42    CA     92
# Charlie   18    CA     70
# Dave      68    TX     70
# Ellen     24    CA     88
# Frank     30    NY     57

By using reset_index(), sequential numbers are set to the index and the original index is added to the data column.

df_r = df.reset_index()
print(df_r)
#       name  age state  point
# 0    Alice   24    NY     64
# 1      Bob   42    CA     92
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57

Use set_index() to change another column to an index.

Applying set_index() to the original DataFrame will delete the original index.

df_s = df.set_index('state')
print(df_s)
#        age  point
# state            
# NY      24     64
# CA      42     92
# CA      18     70
# TX      68     70
# CA      24     88
# NY      30     57

If you want to keep the original index as data column, you can use set_index() after reset_index().

df_rs = df.reset_index().set_index('state')
print(df_rs)
#           name  age  point
# state                     
# NY       Alice   24     64
# CA         Bob   42     92
# CA     Charlie   18     70
# TX        Dave   68     70
# CA       Ellen   24     88
# NY       Frank   30     57
Sponsored Link
Share

Related Categories

Related Posts