pandas: Sort DataFrame/Series with sort_values(), sort_index()
In pandas, the sort_values()
and sort_index()
methods allow you to sort DataFrame
and Series
. You can sort in ascending or descending order, or sort by multiple columns.
Note that the older sort()
method has been deprecated.
Use reindex()
to reorder rows and columns in any specific order.
The pandas version used in this article is as follows. Note that functionality may vary between versions. The following data is used as an example.
import pandas as pd
print(pd.__version__)
# 2.1.4
df = pd.read_csv('data/src/sample_pandas_normal.csv')
print(df)
# name age state point
# 0 Alice 24 NY 64
# 1 Bob 42 CA 92
# 2 Charlie 18 CA 70
# 3 Dave 68 TX 70
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
The following examples use DataFrame
, but Series
also supports the sort_values()
and sort_index()
methods in the same manner.
Sort by values: sort_values()
To sort by values, use the sort_values()
method.
- pandas.DataFrame.sort_values — pandas 2.1.4 documentation
- pandas.Series.sort_values — pandas 2.1.4 documentation
Specify columns to sort by: by
Specify the column label (name) you want to sort by in the first argument, by
.
print(df.sort_values('age'))
# name age state point
# 2 Charlie 18 CA 70
# 0 Alice 24 NY 64
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
# 1 Bob 42 CA 92
# 3 Dave 68 TX 70
print(df.sort_values('state'))
# name age state point
# 1 Bob 42 CA 92
# 2 Charlie 18 CA 70
# 4 Ellen 24 CA 88
# 0 Alice 24 NY 64
# 5 Frank 30 NY 57
# 3 Dave 68 TX 70
Sort by multiple columns
When the first argument (by
) is specified as a list, sorting is based on multiple columns. Sorting is performed in reverse order of the listed columns, with the final sort done by the first specified column.
print(df.sort_values(['age', 'state']))
# name age state point
# 2 Charlie 18 CA 70
# 4 Ellen 24 CA 88
# 0 Alice 24 NY 64
# 5 Frank 30 NY 57
# 1 Bob 42 CA 92
# 3 Dave 68 TX 70
print(df.sort_values(['state', 'age']))
# name age state point
# 2 Charlie 18 CA 70
# 4 Ellen 24 CA 88
# 1 Bob 42 CA 92
# 0 Alice 24 NY 64
# 5 Frank 30 NY 57
# 3 Dave 68 TX 70
Specify ascending or descending: ascending
By default, sorting is in ascending order, but you can set the ascending
argument to False
for descending order.
print(df.sort_values('age', ascending=False))
# name age state point
# 3 Dave 68 TX 70
# 1 Bob 42 CA 92
# 5 Frank 30 NY 57
# 0 Alice 24 NY 64
# 4 Ellen 24 CA 88
# 2 Charlie 18 CA 70
When sorting based on multiple columns, specifying the ascending
argument as a list allows for selecting ascending or descending order for each column.
print(df.sort_values(['age', 'state'], ascending=False))
# name age state point
# 3 Dave 68 TX 70
# 1 Bob 42 CA 92
# 5 Frank 30 NY 57
# 0 Alice 24 NY 64
# 4 Ellen 24 CA 88
# 2 Charlie 18 CA 70
print(df.sort_values(['age', 'state'], ascending=[True, False]))
# name age state point
# 2 Charlie 18 CA 70
# 0 Alice 24 NY 64
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
# 1 Bob 42 CA 92
# 3 Dave 68 TX 70
Reset the index: ignore_index
By default, sorting the values also rearranges the index
(row labels). However, setting the ignore_index
argument to True
resets it to a 0-based sequence.
print(df.sort_values('age', ignore_index=True))
# name age state point
# 0 Charlie 18 CA 70
# 1 Alice 24 NY 64
# 2 Ellen 24 CA 88
# 3 Frank 30 NY 57
# 4 Bob 42 CA 92
# 5 Dave 68 TX 70
Handle NaN
: na_position
By default, NaN
values are placed at the end.
df_nan = df.copy()
df_nan.iloc[0, 1] = float('nan')
df_nan.iloc[4, 1] = float('nan')
print(df_nan)
# name age state point
# 0 Alice NaN NY 64
# 1 Bob 42.0 CA 92
# 2 Charlie 18.0 CA 70
# 3 Dave 68.0 TX 70
# 4 Ellen NaN CA 88
# 5 Frank 30.0 NY 57
print(df_nan.sort_values('age'))
# name age state point
# 2 Charlie 18.0 CA 70
# 5 Frank 30.0 NY 57
# 1 Bob 42.0 CA 92
# 3 Dave 68.0 TX 70
# 0 Alice NaN NY 64
# 4 Ellen NaN CA 88
Setting the na_position
argument to 'first'
places them at the beginning.
print(df_nan.sort_values('age', na_position='first'))
# name age state point
# 0 Alice NaN NY 64
# 4 Ellen NaN CA 88
# 2 Charlie 18.0 CA 70
# 5 Frank 30.0 NY 57
# 1 Bob 42.0 CA 92
# 3 Dave 68.0 TX 70
See the following articles about how to remove and replace missing values.
- pandas: Remove NaN (missing values) with dropna()
- pandas: Replace NaN (missing values) with fillna()
Sort based on a function: key
Like Python's built-in sorted()
function, the sort_values()
method allows you to specify the key
argument. The function specified in key
is applied before sorting, and sorting is based on its results.
For example, you can sort by string length (number of characters) using a lambda expression to apply string methods to the Series
.
print(df.sort_values('name', key=lambda s: s.str.len()))
# name age state point
# 1 Bob 42 CA 92
# 3 Dave 68 TX 70
# 0 Alice 24 NY 64
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
# 2 Charlie 18 CA 70
Note that the key
argument of sort_values()
requires a function that takes a Series
and returns an array-like object such as a Series
or list.
To apply any function to each value, use the map()
method. The following example uses map()
to apply the built-in len()
function for explanatory purposes, instead of using Series
string methods.
# print(df.sort_values('name', key=len))
# TypeError: object of type 'int' has no len()
print(df.sort_values('name', key=lambda s: s.map(len)))
# name age state point
# 1 Bob 42 CA 92
# 3 Dave 68 TX 70
# 0 Alice 24 NY 64
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
# 2 Charlie 18 CA 70
Specify sorting algorithm: kind
The kind
argument specifies the sorting algorithm. Possible options are 'quicksort'
(default), 'mergesort'
, 'heapsort'
, and 'stable'
. Refer to the following NumPy official documentation for details on each algorithm.
Note that this argument is ignored when sorting by multiple columns.
Sort columns based on rows: axis
By default, as in the examples so far, rows are sorted based on columns. To sort columns based on rows, set the axis
argument to 1
or 'columns'
.
To avoid errors caused by mixing numbers and strings, this example extracts numerical columns using select_dtypes()
.
df_num = df.select_dtypes('number')
print(df_num)
# age point
# 0 24 64
# 1 42 92
# 2 18 70
# 3 68 70
# 4 24 88
# 5 30 57
print(df_num.sort_values(1, axis=1, ascending=False))
# point age
# 0 64 24
# 1 92 42
# 2 70 18
# 3 70 68
# 4 88 24
# 5 57 30
Modify the original object: inplace
By default, sorting creates a new object. However, setting the inplace
argument to True
modifies the original object.
df_copy = df.copy()
df_copy.sort_values('age', inplace=True)
print(df_copy)
# name age state point
# 2 Charlie 18 CA 70
# 0 Alice 24 NY 64
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
# 1 Bob 42 CA 92
# 3 Dave 68 TX 70
Sort by index
/columns
(row/column labels): sort_index()
To sort by the index
or columns
(row or column labels), use the sort_index()
method.
- pandas.DataFrame.sort_index — pandas 2.1.4 documentation
- pandas.Series.sort_index — pandas 2.1.4 documentation
Sort by index
(row labels)
By default, rows are sorted by the index
(row labels).
df_sorted = df.sort_values('age')
print(df_sorted)
# name age state point
# 2 Charlie 18 CA 70
# 0 Alice 24 NY 64
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
# 1 Bob 42 CA 92
# 3 Dave 68 TX 70
print(df_sorted.sort_index())
# name age state point
# 0 Alice 24 NY 64
# 1 Bob 42 CA 92
# 2 Charlie 18 CA 70
# 3 Dave 68 TX 70
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
Sort by columns
(column labels): axis
Setting the axis
argument to 1
or 'columns'
sorts columns by the columns
(column labels).
print(df.sort_index(axis=1))
# age name point state
# 0 24 Alice 64 NY
# 1 42 Bob 92 CA
# 2 18 Charlie 70 CA
# 3 68 Dave 70 TX
# 4 24 Ellen 88 CA
# 5 30 Frank 57 NY
The same arguments as sort_values()
can be specified
sort_index()
also has arguments such as ascending
, ignore_index
, na_position
, key
, kind
, and inplace
, with usage identical to sort_values()
.
print(df.sort_index(axis=1, ascending=False, key=lambda s: s.str.len()))
# state point name age
# 0 NY 64 Alice 24
# 1 CA 92 Bob 42
# 2 CA 70 Charlie 18
# 3 TX 70 Dave 68
# 4 CA 88 Ellen 24
# 5 NY 57 Frank 30