pandas: Reorder rows and columns in DataFrame with reindex()

Posted: | Tags: Python, pandas

In pandas, the reindex() method allows you to reorder the rows and columns of a DataFrame by specifying a list of labels (row and column names). This method is also available on Series.

There is also the reindex_like() method that arranges the rows and columns to be the same as in another DataFrame or Series.

With both methods, it is possible to not only rearrange existing rows and columns but also to add new ones.

The sample code below demonstrates the use of DataFrame, but the approach is identical for Series. Note that Series does not have columns, so only the index is specified.

To sort in ascending or descending order, use the sort_index() method.

The pandas version used in this article is as follows. Note that functionality may vary between versions.

import pandas as pd

print(pd.__version__)
# 2.1.4

Basic usage of reindex()

Consider the following DataFrame.

df = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30], 'C': [100, 200, 300]},
                  index=['One', 'Two', 'Three'])
print(df)
#        A   B    C
# One    1  10  100
# Two    2  20  200
# Three  3  30  300

Specify the order of rows and columns: index, columns

A new, rearranged DataFrame can be created by specifying lists of row and column names in the index and columns arguments, which can be used simultaneously.

print(df.reindex(index=['Two', 'Three', 'One']))
#        A   B    C
# Two    2  20  200
# Three  3  30  300
# One    1  10  100

print(df.reindex(columns=['B', 'C', 'A']))
#         B    C  A
# One    10  100  1
# Two    20  200  2
# Three  30  300  3

print(df.reindex(index=['Two', 'Three', 'One'], columns=['B', 'C', 'A']))
#         B    C  A
# Two    20  200  2
# Three  30  300  3
# One    10  100  1

It is not necessary to use all existing row and column names.

print(df.reindex(columns=['B', 'A'], index=['Three', 'One']))
#         B  A
# Three  30  3
# One    10  1

Alternatively, you can specify a list in the labels argument and use the axis argument to indicate whether they are for rows (0 or 'index') or columns (1 or 'columns').

print(df.reindex(['Two', 'Three', 'One'], axis=0))
#        A   B    C
# Two    2  20  200
# Three  3  30  300
# One    1  10  100

print(df.reindex(['B', 'C', 'A'], axis='columns'))
#         B    C  A
# One    10  100  1
# Two    20  200  2
# Three  30  300  3

Note that column rearrangement can also be achieved by df[list_of_column_names].

print(df[['B', 'C', 'A']])
#         B    C  A
# One    10  100  1
# Two    20  200  2
# Three  30  300  3

Specify new row and column names with reindex()

When new row and column names are passed to reindex(), all values are filled as missing values (NaN) by default.

df = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30], 'C': [100, 200, 300]},
                  index=['One', 'Two', 'Three'])
print(df)
#        A   B    C
# One    1  10  100
# Two    2  20  200
# Three  3  30  300

print(df.reindex(columns=['B', 'X', 'C'], index=['Two', 'One', 'Four']))
#          B   X      C
# Two   20.0 NaN  200.0
# One   10.0 NaN  100.0
# Four   NaN NaN    NaN

For more information on adding rows and columns to DataFrame, refer to the following.

Filling with a specific value: fill_value

Specifying a particular value for the fill_value argument will fill all values with that specified value.

print(df.reindex(columns=['B', 'X', 'C'], index=['Two', 'One', 'Four'],
                 fill_value=0))
#        B  X    C
# Two   20  0  200
# One   10  0  100
# Four   0  0    0

Fill with adjacent values: method

If the index of the original DataFrame is either monotonically increasing or decreasing, you can use the method argument to fill gaps with adjacent values.

Consider the following DataFrame.

df = pd.DataFrame({'A': [1, 2], 'B': [10, 20], 'C': [100, 200]},
                  index=[10, 20])
print(df)
#     A   B    C
# 10  1  10  100
# 20  2  20  200

For the method argument, using 'bfill' or 'backfill' fills with the next value, 'ffill' or 'pad' with the previous value, and 'nearest' with the nearest value, defaulting to the next value if equidistant.

print(df.reindex(index=[5, 10, 15, 20, 25]))
#       A     B      C
# 5   NaN   NaN    NaN
# 10  1.0  10.0  100.0
# 15  NaN   NaN    NaN
# 20  2.0  20.0  200.0
# 25  NaN   NaN    NaN

print(df.reindex(index=[5, 10, 15, 20, 25], method='bfill'))
#       A     B      C
# 5   1.0  10.0  100.0
# 10  1.0  10.0  100.0
# 15  2.0  20.0  200.0
# 20  2.0  20.0  200.0
# 25  NaN   NaN    NaN

print(df.reindex(index=[5, 10, 15, 20, 25], method='ffill'))
#       A     B      C
# 5   NaN   NaN    NaN
# 10  1.0  10.0  100.0
# 15  1.0  10.0  100.0
# 20  2.0  20.0  200.0
# 25  2.0  20.0  200.0

print(df.reindex(index=[5, 10, 15, 20, 25], method='nearest'))
#     A   B    C
# 5   1  10  100
# 10  1  10  100
# 15  2  20  200
# 20  2  20  200
# 25  2  20  200

If an integer value is specified for the limit argument, it will fill only that number of consecutive missing values (NaN).

print(df.reindex(index=[10, 12, 14, 16, 18, 20]))
#       A     B      C
# 10  1.0  10.0  100.0
# 12  NaN   NaN    NaN
# 14  NaN   NaN    NaN
# 16  NaN   NaN    NaN
# 18  NaN   NaN    NaN
# 20  2.0  20.0  200.0

print(df.reindex(index=[10, 12, 14, 16, 18, 20], method='bfill', limit=2))
#       A     B      C
# 10  1.0  10.0  100.0
# 12  NaN   NaN    NaN
# 14  NaN   NaN    NaN
# 16  2.0  20.0  200.0
# 18  2.0  20.0  200.0
# 20  2.0  20.0  200.0

Note that the filling process uses the index of the original DataFrame for reference, and is independent of any order specified in the index or column arguments.

print(df.reindex(index=[5, 10, 15, 20, 25], method='bfill'))
#       A     B      C
# 5   1.0  10.0  100.0
# 10  1.0  10.0  100.0
# 15  2.0  20.0  200.0
# 20  2.0  20.0  200.0
# 25  NaN   NaN    NaN

print(df.reindex(index=[25, 20, 15, 10, 5], method='bfill'))
#       A     B      C
# 25  NaN   NaN    NaN
# 20  2.0  20.0  200.0
# 15  2.0  20.0  200.0
# 10  1.0  10.0  100.0
# 5   1.0  10.0  100.0

Other methods for filling and interpolation

Other methods for filling missing values (NaN) include using fillna(), ffill(), bfill(), and for interpolation, interpolate().

df = pd.DataFrame({'A': [1, 2], 'B': [10, 20], 'C': [100, 200]},
                  index=[10, 20])
print(df)
#     A   B    C
# 10  1  10  100
# 20  2  20  200

print(df.reindex(index=[5, 10, 15, 20, 25]).bfill())
#       A     B      C
# 5   1.0  10.0  100.0
# 10  1.0  10.0  100.0
# 15  2.0  20.0  200.0
# 20  2.0  20.0  200.0
# 25  NaN   NaN    NaN

print(df.reindex(index=[5, 10, 15, 20, 25]).interpolate())
#       A     B      C
# 5   NaN   NaN    NaN
# 10  1.0  10.0  100.0
# 15  1.5  15.0  150.0
# 20  2.0  20.0  200.0
# 25  2.0  20.0  200.0

While reindex() cannot fill the gaps for added columns from adjacent values, it is possible with fillna(), ffill(), bfill(), and interpolate().

print(df.reindex(columns=['A', 'X', 'C'], method='bfill'))
#     A   X    C
# 10  1 NaN  100
# 20  2 NaN  200

print(df.reindex(columns=['A', 'X', 'C']).bfill(axis=1))
#       A      X      C
# 10  1.0  100.0  100.0
# 20  2.0  200.0  200.0

print(df.reindex(columns=['A', 'X', 'C']).interpolate(axis=1))
#       A      X      C
# 10  1.0   50.5  100.0
# 20  2.0  101.0  200.0

As mentioned above, the filling methods of reindex() are limited to cases where the original index is monotonically increasing or decreasing. The following cases will result in an error.

df = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30], 'C': [100, 200, 300]},
                  index=[20, 10, 30])
print(df)
#     A   B    C
# 20  1  10  100
# 10  2  20  200
# 30  3  30  300

# print(df.reindex(index=[10, 15, 20], method='bfill'))
# ValueError: index must be monotonic increasing or decreasing

After using reindex(), you can apply methods like fillna(), ffill(), bfill(), and interpolate() to fill in missing values (NaN).

print(df.reindex(index=[10, 15, 20]))
#       A     B      C
# 10  2.0  20.0  200.0
# 15  NaN   NaN    NaN
# 20  1.0  10.0  100.0

print(df.reindex(index=[10, 15, 20]).bfill())
#       A     B      C
# 10  2.0  20.0  200.0
# 15  1.0  10.0  100.0
# 20  1.0  10.0  100.0

print(df.reindex(index=[10, 15, 20]).interpolate())
#       A     B      C
# 10  2.0  20.0  200.0
# 15  1.5  15.0  150.0
# 20  1.0  10.0  100.0

In this case, the data passes through a DataFrame containing missing values (NaN). The data type (dtype) becomes a floating-point number (float), so even if all values are integers, as in the above example, the dtype remains float.

Basic usage of reindex_like()

reindex_like() allows you to rearrange the rows and columns of a DataFrame to be the same as those in another DataFrame.

Consider the following two DataFrame.

df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30], 'C': [100, 200, 300]},
                   index=[10, 20, 30])
print(df1)
#     A   B    C
# 10  1  10  100
# 20  2  20  200
# 30  3  30  300

df2 = pd.DataFrame({'A': [1, 1, 1], 'C': [100, 100, 100]},
                   index=[10, 15, 20])
print(df2)
#     A    C
# 10  1  100
# 15  1  100
# 20  1  100

Specifying the columns and index attributes of the target DataFrame in the columns and index arguments of reindex() creates a DataFrame arranged in exactly that order.

print(df1.reindex(columns=df2.columns, index=df2.index))
#       A      C
# 10  1.0  100.0
# 15  NaN    NaN
# 20  2.0  200.0

The same can be achieved more easily using reindex_like().

print(df1.reindex_like(df2))
#       A      C
# 10  1.0  100.0
# 15  NaN    NaN
# 20  2.0  200.0

With reindex_like(), if the index is monotonically increasing or decreasing, it is possible to specify the method and limit arguments. As of pandas version 2.1.4, the fill_value argument is not implemented, but fillna() can be used instead.

print(df1.reindex_like(df2, method='bfill'))
#     A    C
# 10  1  100
# 15  2  200
# 20  2  200

# print(df1.reindex_like(df2, fill_value=0))
# TypeError: NDFrame.reindex_like() got an unexpected keyword argument 'fill_value'

print(df1.reindex_like(df2).fillna(0))
#       A      C
# 10  1.0  100.0
# 15  0.0    0.0
# 20  2.0  200.0

reindex() and reindex_like() for time series

As shown in the official documentation, reindex() and reindex_like() are useful for aligning the index of time series data.

Consider the following DataFrame. pd.date_range() is used to generate DatetimeIndex.

df = pd.DataFrame({'A': [1, 3, 5], 'B': [10, 30, 50], 'C': [100, 300, 500]},
                  index=pd.date_range('2023-12-01', '2023-12-05', freq='2D'))
print(df)
#             A   B    C
# 2023-12-01  1  10  100
# 2023-12-03  3  30  300
# 2023-12-05  5  50  500

print(df.index)
# DatetimeIndex(['2023-12-01', '2023-12-03', '2023-12-05'], dtype='datetime64[ns]', freq='2D')

For example, when using reindex(), you can use the interpolate() method with the method argument set to 'time' to enable interpolation based on datetime.

new_dt_index = pd.date_range('2023-12-01', '2023-12-05', freq='12H')
print(new_dt_index)
# DatetimeIndex(['2023-12-01 00:00:00', '2023-12-01 12:00:00',
#                '2023-12-02 00:00:00', '2023-12-02 12:00:00',
#                '2023-12-03 00:00:00', '2023-12-03 12:00:00',
#                '2023-12-04 00:00:00', '2023-12-04 12:00:00',
#                '2023-12-05 00:00:00'],
#               dtype='datetime64[ns]', freq='12H')

print(df.reindex(index=new_dt_index))
#                        A     B      C
# 2023-12-01 00:00:00  1.0  10.0  100.0
# 2023-12-01 12:00:00  NaN   NaN    NaN
# 2023-12-02 00:00:00  NaN   NaN    NaN
# 2023-12-02 12:00:00  NaN   NaN    NaN
# 2023-12-03 00:00:00  3.0  30.0  300.0
# 2023-12-03 12:00:00  NaN   NaN    NaN
# 2023-12-04 00:00:00  NaN   NaN    NaN
# 2023-12-04 12:00:00  NaN   NaN    NaN
# 2023-12-05 00:00:00  5.0  50.0  500.0

print(df.reindex(index=new_dt_index, method='ffill'))
#                      A   B    C
# 2023-12-01 00:00:00  1  10  100
# 2023-12-01 12:00:00  1  10  100
# 2023-12-02 00:00:00  1  10  100
# 2023-12-02 12:00:00  1  10  100
# 2023-12-03 00:00:00  3  30  300
# 2023-12-03 12:00:00  3  30  300
# 2023-12-04 00:00:00  3  30  300
# 2023-12-04 12:00:00  3  30  300
# 2023-12-05 00:00:00  5  50  500

print(df.reindex(index=new_dt_index).interpolate(method='time'))
#                        A     B      C
# 2023-12-01 00:00:00  1.0  10.0  100.0
# 2023-12-01 12:00:00  1.5  15.0  150.0
# 2023-12-02 00:00:00  2.0  20.0  200.0
# 2023-12-02 12:00:00  2.5  25.0  250.0
# 2023-12-03 00:00:00  3.0  30.0  300.0
# 2023-12-03 12:00:00  3.5  35.0  350.0
# 2023-12-04 00:00:00  4.0  40.0  400.0
# 2023-12-04 12:00:00  4.5  45.0  450.0
# 2023-12-05 00:00:00  5.0  50.0  500.0

Note that if a list of strings is specified as a new index, the index of the DataFrame returned by reindex() becomes a normal Index and is no longer time series data. interpolate(method='time') is only possible with a DataFrame with DatetimeIndex.

new_index = ['2023-12-01', '2023-12-02', '2023-12-03']

print(df.reindex(index=new_index))
#               A     B      C
# 2023-12-01  1.0  10.0  100.0
# 2023-12-02  NaN   NaN    NaN
# 2023-12-03  3.0  30.0  300.0

print(df.reindex(index=new_index).index)
# Index(['2023-12-01', '2023-12-02', '2023-12-03'], dtype='object')

print(df.reindex(index=new_index, method='bfill'))
#             A   B    C
# 2023-12-01  1  10  100
# 2023-12-02  3  30  300
# 2023-12-03  3  30  300

# print(df.reindex(index=new_index).interpolate(method='time'))
# ValueError: time-weighted interpolation only works on Series or DataFrames with a DatetimeIndex

An example of reindex_like() is as follows.

df2 = pd.DataFrame({'A': [1, 1, 1, 1, 1], 'C': [100, 100, 100, 100, 100]},
                   index=pd.date_range('2023-12-01', '2023-12-05', freq='D'))
print(df2)
#             A    C
# 2023-12-01  1  100
# 2023-12-02  1  100
# 2023-12-03  1  100
# 2023-12-04  1  100
# 2023-12-05  1  100

print(df.reindex_like(df2))
#               A      C
# 2023-12-01  1.0  100.0
# 2023-12-02  NaN    NaN
# 2023-12-03  3.0  300.0
# 2023-12-04  NaN    NaN
# 2023-12-05  5.0  500.0

print(df.reindex_like(df2).interpolate(method='time'))
#               A      C
# 2023-12-01  1.0  100.0
# 2023-12-02  2.0  200.0
# 2023-12-03  3.0  300.0
# 2023-12-04  4.0  400.0
# 2023-12-05  5.0  500.0

Related Categories

Related Articles