pandas: Reorder rows and columns in DataFrame with reindex()

Posted: 2024-01-21 | Tags: Python, pandas

In pandas, the reindex() method allows you to reorder the rows and columns of a DataFrame by specifying a list of labels (row and column names). This method is also available on Series.

There is also the reindex_like() method that arranges the rows and columns to be the same as in another DataFrame or Series.

With both methods, it is possible to not only rearrange existing rows and columns but also to add new ones.

Contents

Basic usage of reindex()
- Specify the order of rows and columns: index, columns
Specify new row and column names with reindex()
- Filling with a specific value: fill_value
Fill with adjacent values: method
- Other methods for filling and interpolation
Basic usage of reindex_like()
reindex() and reindex_like() for time series

The sample code below demonstrates the use of DataFrame, but the approach is identical for Series. Note that Series does not have columns, so only the index is specified.

To sort in ascending or descending order, use the sort_index() method.

pandas: Sort DataFrame, Series with sort_values(), sort_index()

The pandas version used in this article is as follows. Note that functionality may vary between versions.

import pandas as pd

print(pd.__version__)
# 2.1.4

source: pandas_reindex.py

Basic usage of `reindex()`

Consider the following DataFrame.

df = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30], 'C': [100, 200, 300]},
                  index=['One', 'Two', 'Three'])
print(df)
#        A   B    C
# One    1  10  100
# Two    2  20  200
# Three  3  30  300

source: pandas_reindex.py

Specify the order of rows and columns: `index`, `columns`

A new, rearranged DataFrame can be created by specifying lists of row and column names in the index and columns arguments, which can be used simultaneously.

print(df.reindex(index=['Two', 'Three', 'One']))
#        A   B    C
# Two    2  20  200
# Three  3  30  300
# One    1  10  100

print(df.reindex(columns=['B', 'C', 'A']))
#         B    C  A
# One    10  100  1
# Two    20  200  2
# Three  30  300  3

print(df.reindex(index=['Two', 'Three', 'One'], columns=['B', 'C', 'A']))
#         B    C  A
# Two    20  200  2
# Three  30  300  3
# One    10  100  1

source: pandas_reindex.py

It is not necessary to use all existing row and column names.

print(df.reindex(columns=['B', 'A'], index=['Three', 'One']))
#         B  A
# Three  30  3
# One    10  1

source: pandas_reindex.py

Alternatively, you can specify a list in the labels argument and use the axis argument to indicate whether they are for rows (0 or 'index') or columns (1 or 'columns').

print(df.reindex(['Two', 'Three', 'One'], axis=0))
#        A   B    C
# Two    2  20  200
# Three  3  30  300
# One    1  10  100

print(df.reindex(['B', 'C', 'A'], axis='columns'))
#         B    C  A
# One    10  100  1
# Two    20  200  2
# Three  30  300  3

source: pandas_reindex.py

Note that column rearrangement can also be achieved by df[list_of_column_names].

pandas: Select rows/columns by index (numbers and names)

print(df[['B', 'C', 'A']])
#         B    C  A
# One    10  100  1
# Two    20  200  2
# Three  30  300  3

source: pandas_reindex.py

Specify new row and column names with `reindex()`

When new row and column names are passed to reindex(), all values are filled as missing values (NaN) by default.

df = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30], 'C': [100, 200, 300]},
                  index=['One', 'Two', 'Three'])
print(df)
#        A   B    C
# One    1  10  100
# Two    2  20  200
# Three  3  30  300

print(df.reindex(columns=['B', 'X', 'C'], index=['Two', 'One', 'Four']))
#          B   X      C
# Two   20.0 NaN  200.0
# One   10.0 NaN  100.0
# Four   NaN NaN    NaN

source: pandas_reindex.py

For more information on adding rows and columns to DataFrame, refer to the following.

pandas: Add rows/columns to DataFrame with assign(), insert()

Filling with a specific value: `fill_value`

Specifying a particular value for the fill_value argument will fill all values with that specified value.

print(df.reindex(columns=['B', 'X', 'C'], index=['Two', 'One', 'Four'],
                 fill_value=0))
#        B  X    C
# Two   20  0  200
# One   10  0  100
# Four   0  0    0

source: pandas_reindex.py

Fill with adjacent values: `method`

If the index of the original DataFrame is either monotonically increasing or decreasing, you can use the method argument to fill gaps with adjacent values.

Consider the following DataFrame.

df = pd.DataFrame({'A': [1, 2], 'B': [10, 20], 'C': [100, 200]},
                  index=[10, 20])
print(df)
#     A   B    C
# 10  1  10  100
# 20  2  20  200

source: pandas_reindex.py

For the method argument, using 'bfill' or 'backfill' fills with the next value, 'ffill' or 'pad' with the previous value, and 'nearest' with the nearest value, defaulting to the next value if equidistant.

print(df.reindex(index=[5, 10, 15, 20, 25]))
#       A     B      C
# 5   NaN   NaN    NaN
# 10  1.0  10.0  100.0
# 15  NaN   NaN    NaN
# 20  2.0  20.0  200.0
# 25  NaN   NaN    NaN

print(df.reindex(index=[5, 10, 15, 20, 25], method='bfill'))
#       A     B      C
# 5   1.0  10.0  100.0
# 10  1.0  10.0  100.0
# 15  2.0  20.0  200.0
# 20  2.0  20.0  200.0
# 25  NaN   NaN    NaN

print(df.reindex(index=[5, 10, 15, 20, 25], method='ffill'))
#       A     B      C
# 5   NaN   NaN    NaN
# 10  1.0  10.0  100.0
# 15  1.0  10.0  100.0
# 20  2.0  20.0  200.0
# 25  2.0  20.0  200.0

print(df.reindex(index=[5, 10, 15, 20, 25], method='nearest'))
#     A   B    C
# 5   1  10  100
# 10  1  10  100
# 15  2  20  200
# 20  2  20  200
# 25  2  20  200

source: pandas_reindex.py

If an integer value is specified for the limit argument, it will fill only that number of consecutive missing values (NaN).

print(df.reindex(index=[10, 12, 14, 16, 18, 20]))
#       A     B      C
# 10  1.0  10.0  100.0
# 12  NaN   NaN    NaN
# 14  NaN   NaN    NaN
# 16  NaN   NaN    NaN
# 18  NaN   NaN    NaN
# 20  2.0  20.0  200.0

print(df.reindex(index=[10, 12, 14, 16, 18, 20], method='bfill', limit=2))
#       A     B      C
# 10  1.0  10.0  100.0
# 12  NaN   NaN    NaN
# 14  NaN   NaN    NaN
# 16  2.0  20.0  200.0
# 18  2.0  20.0  200.0
# 20  2.0  20.0  200.0

source: pandas_reindex.py

Note that the filling process uses the index of the original DataFrame for reference, and is independent of any order specified in the index or column arguments.

print(df.reindex(index=[5, 10, 15, 20, 25], method='bfill'))
#       A     B      C
# 5   1.0  10.0  100.0
# 10  1.0  10.0  100.0
# 15  2.0  20.0  200.0
# 20  2.0  20.0  200.0
# 25  NaN   NaN    NaN

print(df.reindex(index=[25, 20, 15, 10, 5], method='bfill'))
#       A     B      C
# 25  NaN   NaN    NaN
# 20  2.0  20.0  200.0
# 15  2.0  20.0  200.0
# 10  1.0  10.0  100.0
# 5   1.0  10.0  100.0

source: pandas_reindex.py

Other methods for filling and interpolation

Other methods for filling missing values (NaN) include using fillna(), ffill(), bfill(), and for interpolation, interpolate().

df = pd.DataFrame({'A': [1, 2], 'B': [10, 20], 'C': [100, 200]},
                  index=[10, 20])
print(df)
#     A   B    C
# 10  1  10  100
# 20  2  20  200

print(df.reindex(index=[5, 10, 15, 20, 25]).bfill())
#       A     B      C
# 5   1.0  10.0  100.0
# 10  1.0  10.0  100.0
# 15  2.0  20.0  200.0
# 20  2.0  20.0  200.0
# 25  NaN   NaN    NaN

print(df.reindex(index=[5, 10, 15, 20, 25]).interpolate())
#       A     B      C
# 5   NaN   NaN    NaN
# 10  1.0  10.0  100.0
# 15  1.5  15.0  150.0
# 20  2.0  20.0  200.0
# 25  2.0  20.0  200.0

source: pandas_reindex.py

While reindex() cannot fill the gaps for added columns from adjacent values, it is possible with fillna(), ffill(), bfill(), and interpolate().

print(df.reindex(columns=['A', 'X', 'C'], method='bfill'))
#     A   X    C
# 10  1 NaN  100
# 20  2 NaN  200

print(df.reindex(columns=['A', 'X', 'C']).bfill(axis=1))
#       A      X      C
# 10  1.0  100.0  100.0
# 20  2.0  200.0  200.0

print(df.reindex(columns=['A', 'X', 'C']).interpolate(axis=1))
#       A      X      C
# 10  1.0   50.5  100.0
# 20  2.0  101.0  200.0

source: pandas_reindex.py

As mentioned above, the filling methods of reindex() are limited to cases where the original index is monotonically increasing or decreasing. The following cases will result in an error.

df = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30], 'C': [100, 200, 300]},
                  index=[20, 10, 30])
print(df)
#     A   B    C
# 20  1  10  100
# 10  2  20  200
# 30  3  30  300

# print(df.reindex(index=[10, 15, 20], method='bfill'))
# ValueError: index must be monotonic increasing or decreasing

source: pandas_reindex.py

After using reindex(), you can apply methods like fillna(), ffill(), bfill(), and interpolate() to fill in missing values (NaN).

print(df.reindex(index=[10, 15, 20]))
#       A     B      C
# 10  2.0  20.0  200.0
# 15  NaN   NaN    NaN
# 20  1.0  10.0  100.0

print(df.reindex(index=[10, 15, 20]).bfill())
#       A     B      C
# 10  2.0  20.0  200.0
# 15  1.0  10.0  100.0
# 20  1.0  10.0  100.0

print(df.reindex(index=[10, 15, 20]).interpolate())
#       A     B      C
# 10  2.0  20.0  200.0
# 15  1.5  15.0  150.0
# 20  1.0  10.0  100.0

source: pandas_reindex.py

In this case, the data passes through a DataFrame containing missing values (NaN). The data type (dtype) becomes a floating-point number (float), so even if all values are integers, as in the above example, the dtype remains float.

pandas: How to use astype() to cast dtype of DataFrame

Basic usage of `reindex_like()`

reindex_like() allows you to rearrange the rows and columns of a DataFrame to be the same as those in another DataFrame.

Consider the following two DataFrame.

df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30], 'C': [100, 200, 300]},
                   index=[10, 20, 30])
print(df1)
#     A   B    C
# 10  1  10  100
# 20  2  20  200
# 30  3  30  300

df2 = pd.DataFrame({'A': [1, 1, 1], 'C': [100, 100, 100]},
                   index=[10, 15, 20])
print(df2)
#     A    C
# 10  1  100
# 15  1  100
# 20  1  100

source: pandas_reindex_like.py

Specifying the columns and index attributes of the target DataFrame in the columns and index arguments of reindex() creates a DataFrame arranged in exactly that order.

print(df1.reindex(columns=df2.columns, index=df2.index))
#       A      C
# 10  1.0  100.0
# 15  NaN    NaN
# 20  2.0  200.0

source: pandas_reindex_like.py

The same can be achieved more easily using reindex_like().

print(df1.reindex_like(df2))
#       A      C
# 10  1.0  100.0
# 15  NaN    NaN
# 20  2.0  200.0

source: pandas_reindex_like.py

With reindex_like(), if the index is monotonically increasing or decreasing, it is possible to specify the method and limit arguments. As of pandas version 2.1.4, the fill_value argument is not implemented, but fillna() can be used instead.

print(df1.reindex_like(df2, method='bfill'))
#     A    C
# 10  1  100
# 15  2  200
# 20  2  200

# print(df1.reindex_like(df2, fill_value=0))
# TypeError: NDFrame.reindex_like() got an unexpected keyword argument 'fill_value'

print(df1.reindex_like(df2).fillna(0))
#       A      C
# 10  1.0  100.0
# 15  0.0    0.0
# 20  2.0  200.0

source: pandas_reindex_like.py

`reindex()` and `reindex_like()` for time series

As shown in the official documentation, reindex() and reindex_like() are useful for aligning the index of time series data.

Consider the following DataFrame. pd.date_range() is used to generate DatetimeIndex.

pandas.date_range — pandas 2.1.4 documentation

df = pd.DataFrame({'A': [1, 3, 5], 'B': [10, 30, 50], 'C': [100, 300, 500]},
                  index=pd.date_range('2023-12-01', '2023-12-05', freq='2D'))
print(df)
#             A   B    C
# 2023-12-01  1  10  100
# 2023-12-03  3  30  300
# 2023-12-05  5  50  500

print(df.index)
# DatetimeIndex(['2023-12-01', '2023-12-03', '2023-12-05'], dtype='datetime64[ns]', freq='2D')

source: pandas_reindex_time_series.py

For example, when using reindex(), you can use the interpolate() method with the method argument set to 'time' to enable interpolation based on datetime.

pandas: Interpolate NaN (missing values) with interpolate()

new_dt_index = pd.date_range('2023-12-01', '2023-12-05', freq='12H')
print(new_dt_index)
# DatetimeIndex(['2023-12-01 00:00:00', '2023-12-01 12:00:00',
#                '2023-12-02 00:00:00', '2023-12-02 12:00:00',
#                '2023-12-03 00:00:00', '2023-12-03 12:00:00',
#                '2023-12-04 00:00:00', '2023-12-04 12:00:00',
#                '2023-12-05 00:00:00'],
#               dtype='datetime64[ns]', freq='12H')

print(df.reindex(index=new_dt_index))
#                        A     B      C
# 2023-12-01 00:00:00  1.0  10.0  100.0
# 2023-12-01 12:00:00  NaN   NaN    NaN
# 2023-12-02 00:00:00  NaN   NaN    NaN
# 2023-12-02 12:00:00  NaN   NaN    NaN
# 2023-12-03 00:00:00  3.0  30.0  300.0
# 2023-12-03 12:00:00  NaN   NaN    NaN
# 2023-12-04 00:00:00  NaN   NaN    NaN
# 2023-12-04 12:00:00  NaN   NaN    NaN
# 2023-12-05 00:00:00  5.0  50.0  500.0

print(df.reindex(index=new_dt_index, method='ffill'))
#                      A   B    C
# 2023-12-01 00:00:00  1  10  100
# 2023-12-01 12:00:00  1  10  100
# 2023-12-02 00:00:00  1  10  100
# 2023-12-02 12:00:00  1  10  100
# 2023-12-03 00:00:00  3  30  300
# 2023-12-03 12:00:00  3  30  300
# 2023-12-04 00:00:00  3  30  300
# 2023-12-04 12:00:00  3  30  300
# 2023-12-05 00:00:00  5  50  500

print(df.reindex(index=new_dt_index).interpolate(method='time'))
#                        A     B      C
# 2023-12-01 00:00:00  1.0  10.0  100.0
# 2023-12-01 12:00:00  1.5  15.0  150.0
# 2023-12-02 00:00:00  2.0  20.0  200.0
# 2023-12-02 12:00:00  2.5  25.0  250.0
# 2023-12-03 00:00:00  3.0  30.0  300.0
# 2023-12-03 12:00:00  3.5  35.0  350.0
# 2023-12-04 00:00:00  4.0  40.0  400.0
# 2023-12-04 12:00:00  4.5  45.0  450.0
# 2023-12-05 00:00:00  5.0  50.0  500.0

source: pandas_reindex_time_series.py

Note that if a list of strings is specified as a new index, the index of the DataFrame returned by reindex() becomes a normal Index and is no longer time series data. interpolate(method='time') is only possible with a DataFrame with DatetimeIndex.

new_index = ['2023-12-01', '2023-12-02', '2023-12-03']

print(df.reindex(index=new_index))
#               A     B      C
# 2023-12-01  1.0  10.0  100.0
# 2023-12-02  NaN   NaN    NaN
# 2023-12-03  3.0  30.0  300.0

print(df.reindex(index=new_index).index)
# Index(['2023-12-01', '2023-12-02', '2023-12-03'], dtype='object')

print(df.reindex(index=new_index, method='bfill'))
#             A   B    C
# 2023-12-01  1  10  100
# 2023-12-02  3  30  300
# 2023-12-03  3  30  300

# print(df.reindex(index=new_index).interpolate(method='time'))
# ValueError: time-weighted interpolation only works on Series or DataFrames with a DatetimeIndex

source: pandas_reindex_time_series.py

An example of reindex_like() is as follows.

df2 = pd.DataFrame({'A': [1, 1, 1, 1, 1], 'C': [100, 100, 100, 100, 100]},
                   index=pd.date_range('2023-12-01', '2023-12-05', freq='D'))
print(df2)
#             A    C
# 2023-12-01  1  100
# 2023-12-02  1  100
# 2023-12-03  1  100
# 2023-12-04  1  100
# 2023-12-05  1  100

print(df.reindex_like(df2))
#               A      C
# 2023-12-01  1.0  100.0
# 2023-12-02  NaN    NaN
# 2023-12-03  3.0  300.0
# 2023-12-04  NaN    NaN
# 2023-12-05  5.0  500.0

print(df.reindex_like(df2).interpolate(method='time'))
#               A      C
# 2023-12-01  1.0  100.0
# 2023-12-02  2.0  200.0
# 2023-12-03  3.0  300.0
# 2023-12-04  4.0  400.0
# 2023-12-05  5.0  500.0

source: pandas_reindex_time_series.py

pandas: Reorder rows and columns in DataFrame with reindex()

Basic usage of `reindex()`

Specify the order of rows and columns: `index`, `columns`

Specify new row and column names with `reindex()`

Filling with a specific value: `fill_value`

Fill with adjacent values: `method`

Other methods for filling and interpolation

Basic usage of `reindex_like()`

`reindex()` and `reindex_like()` for time series

Related Categories

Related Articles

pandas: Reorder rows and columns in DataFrame with reindex()

Basic usage of reindex()

Specify the order of rows and columns: index, columns

Specify new row and column names with reindex()

Filling with a specific value: fill_value

Fill with adjacent values: method

Other methods for filling and interpolation

Basic usage of reindex_like()

reindex() and reindex_like() for time series

Related Categories

Related Articles

Basic usage of `reindex()`

Specify the order of rows and columns: `index`, `columns`

Specify new row and column names with `reindex()`

Filling with a specific value: `fill_value`

Fill with adjacent values: `method`

Basic usage of `reindex_like()`

`reindex()` and `reindex_like()` for time series