note.nkmk.me

pandas: Assign existing column to the DataFrame index with set_index()

Posted: 2019-12-05 / Tags: Python, pandas

By using set_index(), you can assign an existing column of pandas.DataFrame to index (row label). Setting unique names for index makes it easy to select elements with loc and at.

This post describes the following contents.

  • How to use set_index()
    • Basic usage
    • Assign multi-index
    • Change index to another column
    • Change original object: inplace
  • Set index when reading CSV file
  • Select rows and elements using index

See the following post for how to change part or all of index instead of assigning an existing column to index.

The following data is used as an example.

import pandas as pd

df = pd.read_csv('data/src/sample_pandas_normal.csv')
print(df)
#       name  age state  point
# 0    Alice   24    NY     64
# 1      Bob   42    CA     92
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57

Click here for sample CSV file.

Sponsored Link

How to use set_index()

Basic usage

Specify the name of the column to be used as an index in the first argument keys.

df_i = df.set_index('name')
print(df_i)
#          age state  point
# name                     
# Alice     24    NY     64
# Bob       42    CA     92
# Charlie   18    CA     70
# Dave      68    TX     70
# Ellen     24    CA     88
# Frank     30    NY     57

Assign multi-index

By using set_index(), multiple columns can be assigned as multi-index.

Specify by list

By specifying a list of column names in the first argument keys, multiple columns are assigned as multi-index.

df_mi = df.set_index(['state', 'name'])
print(df_mi)
#                age  point
# state name               
# NY    Alice     24     64
# CA    Bob       42     92
#       Charlie   18     70
# TX    Dave      68     70
# CA    Ellen     24     88
# NY    Frank     30     57

Sorting with sort_index() makes it displayed neatly.

df_mi.sort_index(inplace=True)
print(df_mi)
#                age  point
# state name               
# CA    Bob       42     92
#       Charlie   18     70
#       Ellen     24     88
# NY    Alice     24     64
#       Frank     30     57
# TX    Dave      68     70

See the following post for details about sorting with sort_values() and sort_index().

append=True

By default, specifying a new column with set_index() deletes the original index.

df_ii = df_i.set_index('state')
print(df_ii)
#        age  point
# state            
# NY      24     64
# CA      42     92
# CA      18     70
# TX      68     70
# CA      24     88
# NY      30     57

If the parameter append is set to True, the specified column will be added as a new level index.

df_mi = df_i.set_index('state', append=True)
print(df_mi)
#                age  point
# name    state            
# Alice   NY      24     64
# Bob     CA      42     92
# Charlie CA      18     70
# Dave    TX      68     70
# Ellen   CA      24     88
# Frank   NY      30     57

The added column is set to the lowest level. If you want to swap levels, use swaplevel().

print(df_mi.swaplevel(0, 1))
#                age  point
# state name               
# NY    Alice     24     64
# CA    Bob       42     92
#       Charlie   18     70
# TX    Dave      68     70
# CA    Ellen     24     88
# NY    Frank     30     57

Change index to another column

If you set another column with set_index(), the original index will be deleted as shown above.

If you want to keep the original index as a column, use reset_index() to reassign the index to a sequential number starting from 0.

df_ri = df_i.reset_index()
print(df_ri)
#       name  age state  point
# 0    Alice   24    NY     64
# 1      Bob   42    CA     92
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57

You can change the index to a different column by using set_index() after reset_index().

df_change = df_i.reset_index().set_index('state')
print(df_change)
#           name  age  point
# state                     
# NY       Alice   24     64
# CA         Bob   42     92
# CA     Charlie   18     70
# TX        Dave   68     70
# CA       Ellen   24     88
# NY       Frank   30     57

See also the following post for reset_index().

Change original object: inplace

By default, set_index() does not change the original object and returns a new object, but if the argument inplace is set to True, the original object is changed.

df.set_index('name', inplace=True)
print(df)
#          age state  point
# name                     
# Alice     24    NY     64
# Bob       42    CA     92
# Charlie   18    CA     70
# Dave      68    TX     70
# Ellen     24    CA     88
# Frank     30    NY     57

Set index when reading CSV file

When reading from a CSV file and generating pandas.DataFrame or pandas.Series, if the original file contains a column that should be used as an index, it can also be specified at reading.

When reading a file with read_csv(), specifying the column number in the argument index_col sets that column to the index.。

df = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0)
print(df)
#          age state  point
# name                     
# Alice     24    NY     64
# Bob       42    CA     92
# Charlie   18    CA     70
# Dave      68    TX     70
# Ellen     24    CA     88
# Frank     30    NY     57
Sponsored Link

Select rows and elements using index

As in previous examples, if you specify a unique name for the index, you can easily select rows and elements by the name.

print(df)
#          age state  point
# name                     
# Alice     24    NY     64
# Bob       42    CA     92
# Charlie   18    CA     70
# Dave      68    TX     70
# Ellen     24    CA     88
# Frank     30    NY     57

print(df.loc['Bob'])
# age      42
# state    CA
# point    92
# Name: Bob, dtype: object

print(df.at['Bob', 'age'])
# 42
Sponsored Link
Share

Related Categories

Related Posts