note.nkmk.me

pandas: Assign existing column to the DataFrame index with set_index()

Posted: 2019-12-05 / Modified: 2021-04-08 / Tags: Python, pandas

By using set_index(), you can assign an existing column of pandas.DataFrame to index (row label). Setting unique names for index makes it easy to select elements with loc and at.

This article describes the following contents.

  • How to use set_index()
    • Basic usage
    • Keep the specified column: drop
    • Assign multi-index
    • Keep the original index as a column
    • Change original object: inplace
  • Set index when reading CSV file
  • Select rows and elements using index

See the following article for how to rename index instead of assigning an existing column to index.

The following data is used as an example.

import pandas as pd

df = pd.read_csv('data/src/sample_pandas_normal.csv')
print(df)
#       name  age state  point
# 0    Alice   24    NY     64
# 1      Bob   42    CA     92
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57

Click here for sample CSV file.

Sponsored Link

How to use set_index()

Basic usage

Specify the name of the column to be used as an index in the first argument keys.

df_i = df.set_index('name')
print(df_i)
#          age state  point
# name                     
# Alice     24    NY     64
# Bob       42    CA     92
# Charlie   18    CA     70
# Dave      68    TX     70
# Ellen     24    CA     88
# Frank     30    NY     57

Keep the specified column: : drop

By default, the specified column is deleted as shown in the example above. If the argument drop is set to False, the specified column is set to index and remain in the data column.

df_id = df.set_index('name', drop=False)
print(df_id)
#             name  age state  point
# name                              
# Alice      Alice   24    NY     64
# Bob          Bob   42    CA     92
# Charlie  Charlie   18    CA     70
# Dave        Dave   68    TX     70
# Ellen      Ellen   24    CA     88
# Frank      Frank   30    NY     57

Assign multi-index

By using set_index(), multiple columns can be assigned as multi-index.

Specify by list

By specifying a list of column names in the first argument keys, multiple columns are assigned as multi-index.

df_mi = df.set_index(['state', 'name'])
print(df_mi)
#                age  point
# state name               
# NY    Alice     24     64
# CA    Bob       42     92
#       Charlie   18     70
# TX    Dave      68     70
# CA    Ellen     24     88
# NY    Frank     30     57

Sorting with sort_index() makes it displayed neatly.

df_mi.sort_index(inplace=True)
print(df_mi)
#                age  point
# state name               
# CA    Bob       42     92
#       Charlie   18     70
#       Ellen     24     88
# NY    Alice     24     64
#       Frank     30     57
# TX    Dave      68     70

See the following article for details about sorting with sort_values() and sort_index().

Add a column to the multi-index: append

By default, specifying a new column with set_index() deletes the original index.

print(df_i)
#          age state  point
# name                     
# Alice     24    NY     64
# Bob       42    CA     92
# Charlie   18    CA     70
# Dave      68    TX     70
# Ellen     24    CA     88
# Frank     30    NY     57

df_ii = df_i.set_index('state')
print(df_ii)
#        age  point
# state            
# NY      24     64
# CA      42     92
# CA      18     70
# TX      68     70
# CA      24     88
# NY      30     57

If the argument append is set to True, the specified column will be added as a new level index.

df_mi = df_i.set_index('state', append=True)
print(df_mi)
#                age  point
# name    state            
# Alice   NY      24     64
# Bob     CA      42     92
# Charlie CA      18     70
# Dave    TX      68     70
# Ellen   CA      24     88
# Frank   NY      30     57

The added column is set to the lowest level. If you want to swap levels, use swaplevel().

print(df_mi.swaplevel(0, 1))
#                age  point
# state name               
# NY    Alice     24     64
# CA    Bob       42     92
#       Charlie   18     70
# TX    Dave      68     70
# CA    Ellen     24     88
# NY    Frank     30     57

Keep the original index as a column

If you set a column to index with set_index() as in the previous examples, the original index will be deleted.

If you want to keep the original index as a column, use reset_index() to reassign the index to a sequential number starting from 0.

print(df_i)
#          age state  point
# name                     
# Alice     24    NY     64
# Bob       42    CA     92
# Charlie   18    CA     70
# Dave      68    TX     70
# Ellen     24    CA     88
# Frank     30    NY     57

df_ri = df_i.reset_index()
print(df_ri)
#       name  age state  point
# 0    Alice   24    NY     64
# 1      Bob   42    CA     92
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57

You can change the index to a different column by using set_index() after reset_index().

df_change = df_i.reset_index().set_index('state')
print(df_change)
#           name  age  point
# state                     
# NY       Alice   24     64
# CA         Bob   42     92
# CA     Charlie   18     70
# TX        Dave   68     70
# CA       Ellen   24     88
# NY       Frank   30     57

See also the following article for reset_index().

Change original object: inplace

By default, set_index() does not change the original object and returns a new object, but if the argument inplace is set to True, the original object is changed.

df.set_index('name', inplace=True)
print(df)
#          age state  point
# name                     
# Alice     24    NY     64
# Bob       42    CA     92
# Charlie   18    CA     70
# Dave      68    TX     70
# Ellen     24    CA     88
# Frank     30    NY     57

Set index when reading CSV file

When reading from a CSV file and generating pandas.DataFrame or pandas.Series, if the original file contains a column that should be used as an index, it can also be specified at reading.

When reading a file with read_csv(), specifying the column number in the argument index_col sets that column to the index.。

df = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0)
print(df)
#          age state  point
# name                     
# Alice     24    NY     64
# Bob       42    CA     92
# Charlie   18    CA     70
# Dave      68    TX     70
# Ellen     24    CA     88
# Frank     30    NY     57
Sponsored Link

Select rows and elements using index

As in previous examples, if you specify a unique name for the index, you can easily select rows and elements by the name.

print(df)
#          age state  point
# name                     
# Alice     24    NY     64
# Bob       42    CA     92
# Charlie   18    CA     70
# Dave      68    TX     70
# Ellen     24    CA     88
# Frank     30    NY     57

print(df.loc['Bob'])
# age      42
# state    CA
# point    92
# Name: Bob, dtype: object

print(df.at['Bob', 'age'])
# 42
Sponsored Link
Share

Related Categories

Related Articles