pandas: Sort DataFrame/Series with sort_values(), sort_index()

Modified: 2024-01-25 | Tags: Python, pandas

In pandas, the sort_values() and sort_index() methods allow you to sort DataFrame and Series. You can sort in ascending or descending order, or sort by multiple columns.

Note that the older sort() method has been deprecated.

Contents

Sort by values: sort_values()
Sort by index/columns (row/column labels): sort_index()

Use reindex() to reorder rows and columns in any specific order.

pandas: Reorder rows and columns in DataFrame with reindex()

The pandas version used in this article is as follows. Note that functionality may vary between versions. The following data is used as an example.

sample_pandas_normal.csv

import pandas as pd

print(pd.__version__)
# 2.1.4

df = pd.read_csv('data/src/sample_pandas_normal.csv')
print(df)
#       name  age state  point
# 0    Alice   24    NY     64
# 1      Bob   42    CA     92
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57

source: pandas_sort_values_sort_index.py

The following examples use DataFrame, but Series also supports the sort_values() and sort_index() methods in the same manner.

Sort by values: `sort_values()`

To sort by values, use the sort_values() method.

Specify columns to sort by: `by`

Specify the column label (name) you want to sort by in the first argument, by.

print(df.sort_values('age'))
#       name  age state  point
# 2  Charlie   18    CA     70
# 0    Alice   24    NY     64
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57
# 1      Bob   42    CA     92
# 3     Dave   68    TX     70

print(df.sort_values('state'))
#       name  age state  point
# 1      Bob   42    CA     92
# 2  Charlie   18    CA     70
# 4    Ellen   24    CA     88
# 0    Alice   24    NY     64
# 5    Frank   30    NY     57
# 3     Dave   68    TX     70

source: pandas_sort_values_sort_index.py

Sort by multiple columns

When the first argument (by) is specified as a list, sorting is based on multiple columns. Sorting is performed in reverse order of the listed columns, with the final sort done by the first specified column.

print(df.sort_values(['age', 'state']))
#       name  age state  point
# 2  Charlie   18    CA     70
# 4    Ellen   24    CA     88
# 0    Alice   24    NY     64
# 5    Frank   30    NY     57
# 1      Bob   42    CA     92
# 3     Dave   68    TX     70

print(df.sort_values(['state', 'age']))
#       name  age state  point
# 2  Charlie   18    CA     70
# 4    Ellen   24    CA     88
# 1      Bob   42    CA     92
# 0    Alice   24    NY     64
# 5    Frank   30    NY     57
# 3     Dave   68    TX     70

source: pandas_sort_values_sort_index.py

Specify ascending or descending: `ascending`

By default, sorting is in ascending order, but you can set the ascending argument to False for descending order.

print(df.sort_values('age', ascending=False))
#       name  age state  point
# 3     Dave   68    TX     70
# 1      Bob   42    CA     92
# 5    Frank   30    NY     57
# 0    Alice   24    NY     64
# 4    Ellen   24    CA     88
# 2  Charlie   18    CA     70

source: pandas_sort_values_sort_index.py

When sorting based on multiple columns, specifying the ascending argument as a list allows for selecting ascending or descending order for each column.

print(df.sort_values(['age', 'state'], ascending=False))
#       name  age state  point
# 3     Dave   68    TX     70
# 1      Bob   42    CA     92
# 5    Frank   30    NY     57
# 0    Alice   24    NY     64
# 4    Ellen   24    CA     88
# 2  Charlie   18    CA     70

print(df.sort_values(['age', 'state'], ascending=[True, False]))
#       name  age state  point
# 2  Charlie   18    CA     70
# 0    Alice   24    NY     64
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57
# 1      Bob   42    CA     92
# 3     Dave   68    TX     70

source: pandas_sort_values_sort_index.py

Reset the index: `ignore_index`

By default, sorting the values also rearranges the index (row labels). However, setting the ignore_index argument to True resets it to a 0-based sequence.

print(df.sort_values('age', ignore_index=True))
#       name  age state  point
# 0  Charlie   18    CA     70
# 1    Alice   24    NY     64
# 2    Ellen   24    CA     88
# 3    Frank   30    NY     57
# 4      Bob   42    CA     92
# 5     Dave   68    TX     70

source: pandas_sort_values_sort_index.py

Handle `NaN`: `na_position`

By default, NaN values are placed at the end.

df_nan = df.copy()
df_nan.iloc[0, 1] = float('nan')
df_nan.iloc[4, 1] = float('nan')
print(df_nan)
#       name   age state  point
# 0    Alice   NaN    NY     64
# 1      Bob  42.0    CA     92
# 2  Charlie  18.0    CA     70
# 3     Dave  68.0    TX     70
# 4    Ellen   NaN    CA     88
# 5    Frank  30.0    NY     57

print(df_nan.sort_values('age'))
#       name   age state  point
# 2  Charlie  18.0    CA     70
# 5    Frank  30.0    NY     57
# 1      Bob  42.0    CA     92
# 3     Dave  68.0    TX     70
# 0    Alice   NaN    NY     64
# 4    Ellen   NaN    CA     88

source: pandas_sort_values_sort_index.py

Setting the na_position argument to 'first' places them at the beginning.

print(df_nan.sort_values('age', na_position='first'))
#       name   age state  point
# 0    Alice   NaN    NY     64
# 4    Ellen   NaN    CA     88
# 2  Charlie  18.0    CA     70
# 5    Frank  30.0    NY     57
# 1      Bob  42.0    CA     92
# 3     Dave  68.0    TX     70

source: pandas_sort_values_sort_index.py

See the following articles about how to remove and replace missing values.

Sort based on a function: `key`

Like Python's built-in sorted() function, the sort_values() method allows you to specify the key argument. The function specified in key is applied before sorting, and sorting is based on its results.

How to use the key argument in Python (sorted, max, etc.)

For example, you can sort by string length (number of characters) using a lambda expression to apply string methods to the Series.

print(df.sort_values('name', key=lambda s: s.str.len()))
#       name  age state  point
# 1      Bob   42    CA     92
# 3     Dave   68    TX     70
# 0    Alice   24    NY     64
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57
# 2  Charlie   18    CA     70

source: pandas_sort_values_sort_index.py

Note that the key argument of sort_values() requires a function that takes a Series and returns an array-like object such as a Series or list.

To apply any function to each value, use the map() method. The following example uses map() to apply the built-in len() function for explanatory purposes, instead of using Series string methods.

pandas: Apply functions to values, rows, columns with map(), apply()

# print(df.sort_values('name', key=len))
# TypeError: object of type 'int' has no len()

print(df.sort_values('name', key=lambda s: s.map(len)))
#       name  age state  point
# 1      Bob   42    CA     92
# 3     Dave   68    TX     70
# 0    Alice   24    NY     64
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57
# 2  Charlie   18    CA     70

source: pandas_sort_values_sort_index.py

Specify sorting algorithm: `kind`

The kind argument specifies the sorting algorithm. Possible options are 'quicksort' (default), 'mergesort', 'heapsort', and 'stable'. Refer to the following NumPy official documentation for details on each algorithm.

numpy.sort — NumPy v1.26 Manual

Note that this argument is ignored when sorting by multiple columns.

Sort columns based on rows: `axis`

By default, as in the examples so far, rows are sorted based on columns. To sort columns based on rows, set the axis argument to 1 or 'columns'.

To avoid errors caused by mixing numbers and strings, this example extracts numerical columns using select_dtypes().

pandas: Select columns by dtype with select_dtypes()

df_num = df.select_dtypes('number')
print(df_num)
#    age  point
# 0   24     64
# 1   42     92
# 2   18     70
# 3   68     70
# 4   24     88
# 5   30     57

print(df_num.sort_values(1, axis=1, ascending=False))
#    point  age
# 0     64   24
# 1     92   42
# 2     70   18
# 3     70   68
# 4     88   24
# 5     57   30

source: pandas_sort_values_sort_index.py

Modify the original object: `inplace`

By default, sorting creates a new object. However, setting the inplace argument to True modifies the original object.

df_copy = df.copy()
df_copy.sort_values('age', inplace=True)
print(df_copy)
#       name  age state  point
# 2  Charlie   18    CA     70
# 0    Alice   24    NY     64
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57
# 1      Bob   42    CA     92
# 3     Dave   68    TX     70

source: pandas_sort_values_sort_index.py

Sort by `index`/`columns` (row/column labels): `sort_index()`

To sort by the index or columns (row or column labels), use the sort_index() method.

Sort by `index` (row labels)

By default, rows are sorted by the index (row labels).

df_sorted = df.sort_values('age')
print(df_sorted)
#       name  age state  point
# 2  Charlie   18    CA     70
# 0    Alice   24    NY     64
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57
# 1      Bob   42    CA     92
# 3     Dave   68    TX     70

print(df_sorted.sort_index())
#       name  age state  point
# 0    Alice   24    NY     64
# 1      Bob   42    CA     92
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57

source: pandas_sort_values_sort_index.py

Sort by `columns` (column labels): `axis`

Setting the axis argument to 1 or 'columns' sorts columns by the columns (column labels).

print(df.sort_index(axis=1))
#    age     name  point state
# 0   24    Alice     64    NY
# 1   42      Bob     92    CA
# 2   18  Charlie     70    CA
# 3   68     Dave     70    TX
# 4   24    Ellen     88    CA
# 5   30    Frank     57    NY

source: pandas_sort_values_sort_index.py

The same arguments as `sort_values()` can be specified

sort_index() also has arguments such as ascending, ignore_index, na_position, key, kind, and inplace, with usage identical to sort_values().

print(df.sort_index(axis=1, ascending=False, key=lambda s: s.str.len()))
#   state  point     name  age
# 0    NY     64    Alice   24
# 1    CA     92      Bob   42
# 2    CA     70  Charlie   18
# 3    TX     70     Dave   68
# 4    CA     88    Ellen   24
# 5    NY     57    Frank   30

source: pandas_sort_values_sort_index.py

pandas: Sort DataFrame/Series with sort_values(), sort_index()

Sort by values: `sort_values()`

Specify columns to sort by: `by`

Sort by multiple columns

Specify ascending or descending: `ascending`

Reset the index: `ignore_index`

Handle `NaN`: `na_position`

Sort based on a function: `key`

Specify sorting algorithm: `kind`

Sort columns based on rows: `axis`

Modify the original object: `inplace`

Sort by `index`/`columns` (row/column labels): `sort_index()`

Sort by `index` (row labels)

Sort by `columns` (column labels): `axis`

The same arguments as `sort_values()` can be specified

Related Categories

Related Articles

pandas: Sort DataFrame/Series with sort_values(), sort_index()

Sort by values: sort_values()

Specify columns to sort by: by

Sort by multiple columns

Specify ascending or descending: ascending

Reset the index: ignore_index

Handle NaN: na_position

Sort based on a function: key

Specify sorting algorithm: kind

Sort columns based on rows: axis

Modify the original object: inplace

Sort by index/columns (row/column labels): sort_index()

Sort by index (row labels)

Sort by columns (column labels): axis

The same arguments as sort_values() can be specified

Related Categories

Related Articles

Sort by values: `sort_values()`

Specify columns to sort by: `by`

Specify ascending or descending: `ascending`

Reset the index: `ignore_index`

Handle `NaN`: `na_position`

Sort based on a function: `key`

Specify sorting algorithm: `kind`

Sort columns based on rows: `axis`

Modify the original object: `inplace`

Sort by `index`/`columns` (row/column labels): `sort_index()`

Sort by `index` (row labels)

Sort by `columns` (column labels): `axis`

The same arguments as `sort_values()` can be specified