pandas: Reorder rows and columns in DataFrame with reindex()
In pandas, the reindex() method allows you to reorder the rows and columns of a DataFrame by specifying a list of labels (row and column names). This method is also available on Series.
- pandas.DataFrame.reindex — pandas 2.1.4 documentation
- pandas.Series.reindex — pandas 2.1.4 documentation
There is also the reindex_like() method that arranges the rows and columns to be the same as in another DataFrame or Series.
- pandas.DataFrame.reindex_like — pandas 2.1.4 documentation
- pandas.Series.reindex_like — pandas 2.1.4 documentation
With both methods, it is possible to not only rearrange existing rows and columns but also to add new ones.
The sample code below demonstrates the use of DataFrame, but the approach is identical for Series. Note that Series does not have columns, so only the index is specified.
To sort in ascending or descending order, use the sort_index() method.
The pandas version used in this article is as follows. Note that functionality may vary between versions.
import pandas as pd
print(pd.__version__)
# 2.1.4
Basic usage of reindex()
Consider the following DataFrame.
df = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30], 'C': [100, 200, 300]},
index=['One', 'Two', 'Three'])
print(df)
# A B C
# One 1 10 100
# Two 2 20 200
# Three 3 30 300
Specify the order of rows and columns: index, columns
A new, rearranged DataFrame can be created by specifying lists of row and column names in the index and columns arguments, which can be used simultaneously.
print(df.reindex(index=['Two', 'Three', 'One']))
# A B C
# Two 2 20 200
# Three 3 30 300
# One 1 10 100
print(df.reindex(columns=['B', 'C', 'A']))
# B C A
# One 10 100 1
# Two 20 200 2
# Three 30 300 3
print(df.reindex(index=['Two', 'Three', 'One'], columns=['B', 'C', 'A']))
# B C A
# Two 20 200 2
# Three 30 300 3
# One 10 100 1
It is not necessary to use all existing row and column names.
print(df.reindex(columns=['B', 'A'], index=['Three', 'One']))
# B A
# Three 30 3
# One 10 1
Alternatively, you can specify a list in the labels argument and use the axis argument to indicate whether they are for rows (0 or 'index') or columns (1 or 'columns').
print(df.reindex(['Two', 'Three', 'One'], axis=0))
# A B C
# Two 2 20 200
# Three 3 30 300
# One 1 10 100
print(df.reindex(['B', 'C', 'A'], axis='columns'))
# B C A
# One 10 100 1
# Two 20 200 2
# Three 30 300 3
Note that column rearrangement can also be achieved by df[list_of_column_names].
print(df[['B', 'C', 'A']])
# B C A
# One 10 100 1
# Two 20 200 2
# Three 30 300 3
Specify new row and column names with reindex()
When new row and column names are passed to reindex(), all values are filled as missing values (NaN) by default.
df = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30], 'C': [100, 200, 300]},
index=['One', 'Two', 'Three'])
print(df)
# A B C
# One 1 10 100
# Two 2 20 200
# Three 3 30 300
print(df.reindex(columns=['B', 'X', 'C'], index=['Two', 'One', 'Four']))
# B X C
# Two 20.0 NaN 200.0
# One 10.0 NaN 100.0
# Four NaN NaN NaN
For more information on adding rows and columns to DataFrame, refer to the following.
Filling with a specific value: fill_value
Specifying a particular value for the fill_value argument will fill all values with that specified value.
print(df.reindex(columns=['B', 'X', 'C'], index=['Two', 'One', 'Four'],
fill_value=0))
# B X C
# Two 20 0 200
# One 10 0 100
# Four 0 0 0
Fill with adjacent values: method
If the index of the original DataFrame is either monotonically increasing or decreasing, you can use the method argument to fill gaps with adjacent values.
Consider the following DataFrame.
df = pd.DataFrame({'A': [1, 2], 'B': [10, 20], 'C': [100, 200]},
index=[10, 20])
print(df)
# A B C
# 10 1 10 100
# 20 2 20 200
For the method argument, using 'bfill' or 'backfill' fills with the next value, 'ffill' or 'pad' with the previous value, and 'nearest' with the nearest value, defaulting to the next value if equidistant.
print(df.reindex(index=[5, 10, 15, 20, 25]))
# A B C
# 5 NaN NaN NaN
# 10 1.0 10.0 100.0
# 15 NaN NaN NaN
# 20 2.0 20.0 200.0
# 25 NaN NaN NaN
print(df.reindex(index=[5, 10, 15, 20, 25], method='bfill'))
# A B C
# 5 1.0 10.0 100.0
# 10 1.0 10.0 100.0
# 15 2.0 20.0 200.0
# 20 2.0 20.0 200.0
# 25 NaN NaN NaN
print(df.reindex(index=[5, 10, 15, 20, 25], method='ffill'))
# A B C
# 5 NaN NaN NaN
# 10 1.0 10.0 100.0
# 15 1.0 10.0 100.0
# 20 2.0 20.0 200.0
# 25 2.0 20.0 200.0
print(df.reindex(index=[5, 10, 15, 20, 25], method='nearest'))
# A B C
# 5 1 10 100
# 10 1 10 100
# 15 2 20 200
# 20 2 20 200
# 25 2 20 200
If an integer value is specified for the limit argument, it will fill only that number of consecutive missing values (NaN).
print(df.reindex(index=[10, 12, 14, 16, 18, 20]))
# A B C
# 10 1.0 10.0 100.0
# 12 NaN NaN NaN
# 14 NaN NaN NaN
# 16 NaN NaN NaN
# 18 NaN NaN NaN
# 20 2.0 20.0 200.0
print(df.reindex(index=[10, 12, 14, 16, 18, 20], method='bfill', limit=2))
# A B C
# 10 1.0 10.0 100.0
# 12 NaN NaN NaN
# 14 NaN NaN NaN
# 16 2.0 20.0 200.0
# 18 2.0 20.0 200.0
# 20 2.0 20.0 200.0
Note that the filling process uses the index of the original DataFrame for reference, and is independent of any order specified in the index or column arguments.
print(df.reindex(index=[5, 10, 15, 20, 25], method='bfill'))
# A B C
# 5 1.0 10.0 100.0
# 10 1.0 10.0 100.0
# 15 2.0 20.0 200.0
# 20 2.0 20.0 200.0
# 25 NaN NaN NaN
print(df.reindex(index=[25, 20, 15, 10, 5], method='bfill'))
# A B C
# 25 NaN NaN NaN
# 20 2.0 20.0 200.0
# 15 2.0 20.0 200.0
# 10 1.0 10.0 100.0
# 5 1.0 10.0 100.0
Other methods for filling and interpolation
Other methods for filling missing values (NaN) include using fillna(), ffill(), bfill(), and for interpolation, interpolate().
- pandas: Replace NaN (missing values) with fillna()
- pandas: Interpolate NaN (missing values) with interpolate()
df = pd.DataFrame({'A': [1, 2], 'B': [10, 20], 'C': [100, 200]},
index=[10, 20])
print(df)
# A B C
# 10 1 10 100
# 20 2 20 200
print(df.reindex(index=[5, 10, 15, 20, 25]).bfill())
# A B C
# 5 1.0 10.0 100.0
# 10 1.0 10.0 100.0
# 15 2.0 20.0 200.0
# 20 2.0 20.0 200.0
# 25 NaN NaN NaN
print(df.reindex(index=[5, 10, 15, 20, 25]).interpolate())
# A B C
# 5 NaN NaN NaN
# 10 1.0 10.0 100.0
# 15 1.5 15.0 150.0
# 20 2.0 20.0 200.0
# 25 2.0 20.0 200.0
While reindex() cannot fill the gaps for added columns from adjacent values, it is possible with fillna(), ffill(), bfill(), and interpolate().
print(df.reindex(columns=['A', 'X', 'C'], method='bfill'))
# A X C
# 10 1 NaN 100
# 20 2 NaN 200
print(df.reindex(columns=['A', 'X', 'C']).bfill(axis=1))
# A X C
# 10 1.0 100.0 100.0
# 20 2.0 200.0 200.0
print(df.reindex(columns=['A', 'X', 'C']).interpolate(axis=1))
# A X C
# 10 1.0 50.5 100.0
# 20 2.0 101.0 200.0
As mentioned above, the filling methods of reindex() are limited to cases where the original index is monotonically increasing or decreasing. The following cases will result in an error.
df = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30], 'C': [100, 200, 300]},
index=[20, 10, 30])
print(df)
# A B C
# 20 1 10 100
# 10 2 20 200
# 30 3 30 300
# print(df.reindex(index=[10, 15, 20], method='bfill'))
# ValueError: index must be monotonic increasing or decreasing
After using reindex(), you can apply methods like fillna(), ffill(), bfill(), and interpolate() to fill in missing values (NaN).
print(df.reindex(index=[10, 15, 20]))
# A B C
# 10 2.0 20.0 200.0
# 15 NaN NaN NaN
# 20 1.0 10.0 100.0
print(df.reindex(index=[10, 15, 20]).bfill())
# A B C
# 10 2.0 20.0 200.0
# 15 1.0 10.0 100.0
# 20 1.0 10.0 100.0
print(df.reindex(index=[10, 15, 20]).interpolate())
# A B C
# 10 2.0 20.0 200.0
# 15 1.5 15.0 150.0
# 20 1.0 10.0 100.0
In this case, the data passes through a DataFrame containing missing values (NaN). The data type (dtype) becomes a floating-point number (float), so even if all values are integers, as in the above example, the dtype remains float.
Basic usage of reindex_like()
reindex_like() allows you to rearrange the rows and columns of a DataFrame to be the same as those in another DataFrame.
Consider the following two DataFrame.
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30], 'C': [100, 200, 300]},
index=[10, 20, 30])
print(df1)
# A B C
# 10 1 10 100
# 20 2 20 200
# 30 3 30 300
df2 = pd.DataFrame({'A': [1, 1, 1], 'C': [100, 100, 100]},
index=[10, 15, 20])
print(df2)
# A C
# 10 1 100
# 15 1 100
# 20 1 100
Specifying the columns and index attributes of the target DataFrame in the columns and index arguments of reindex() creates a DataFrame arranged in exactly that order.
print(df1.reindex(columns=df2.columns, index=df2.index))
# A C
# 10 1.0 100.0
# 15 NaN NaN
# 20 2.0 200.0
The same can be achieved more easily using reindex_like().
print(df1.reindex_like(df2))
# A C
# 10 1.0 100.0
# 15 NaN NaN
# 20 2.0 200.0
With reindex_like(), if the index is monotonically increasing or decreasing, it is possible to specify the method and limit arguments. As of pandas version 2.1.4, the fill_value argument is not implemented, but fillna() can be used instead.
print(df1.reindex_like(df2, method='bfill'))
# A C
# 10 1 100
# 15 2 200
# 20 2 200
# print(df1.reindex_like(df2, fill_value=0))
# TypeError: NDFrame.reindex_like() got an unexpected keyword argument 'fill_value'
print(df1.reindex_like(df2).fillna(0))
# A C
# 10 1.0 100.0
# 15 0.0 0.0
# 20 2.0 200.0
reindex() and reindex_like() for time series
As shown in the official documentation, reindex() and reindex_like() are useful for aligning the index of time series data.
Consider the following DataFrame. pd.date_range() is used to generate DatetimeIndex.
df = pd.DataFrame({'A': [1, 3, 5], 'B': [10, 30, 50], 'C': [100, 300, 500]},
index=pd.date_range('2023-12-01', '2023-12-05', freq='2D'))
print(df)
# A B C
# 2023-12-01 1 10 100
# 2023-12-03 3 30 300
# 2023-12-05 5 50 500
print(df.index)
# DatetimeIndex(['2023-12-01', '2023-12-03', '2023-12-05'], dtype='datetime64[ns]', freq='2D')
For example, when using reindex(), you can use the interpolate() method with the method argument set to 'time' to enable interpolation based on datetime.
new_dt_index = pd.date_range('2023-12-01', '2023-12-05', freq='12H')
print(new_dt_index)
# DatetimeIndex(['2023-12-01 00:00:00', '2023-12-01 12:00:00',
# '2023-12-02 00:00:00', '2023-12-02 12:00:00',
# '2023-12-03 00:00:00', '2023-12-03 12:00:00',
# '2023-12-04 00:00:00', '2023-12-04 12:00:00',
# '2023-12-05 00:00:00'],
# dtype='datetime64[ns]', freq='12H')
print(df.reindex(index=new_dt_index))
# A B C
# 2023-12-01 00:00:00 1.0 10.0 100.0
# 2023-12-01 12:00:00 NaN NaN NaN
# 2023-12-02 00:00:00 NaN NaN NaN
# 2023-12-02 12:00:00 NaN NaN NaN
# 2023-12-03 00:00:00 3.0 30.0 300.0
# 2023-12-03 12:00:00 NaN NaN NaN
# 2023-12-04 00:00:00 NaN NaN NaN
# 2023-12-04 12:00:00 NaN NaN NaN
# 2023-12-05 00:00:00 5.0 50.0 500.0
print(df.reindex(index=new_dt_index, method='ffill'))
# A B C
# 2023-12-01 00:00:00 1 10 100
# 2023-12-01 12:00:00 1 10 100
# 2023-12-02 00:00:00 1 10 100
# 2023-12-02 12:00:00 1 10 100
# 2023-12-03 00:00:00 3 30 300
# 2023-12-03 12:00:00 3 30 300
# 2023-12-04 00:00:00 3 30 300
# 2023-12-04 12:00:00 3 30 300
# 2023-12-05 00:00:00 5 50 500
print(df.reindex(index=new_dt_index).interpolate(method='time'))
# A B C
# 2023-12-01 00:00:00 1.0 10.0 100.0
# 2023-12-01 12:00:00 1.5 15.0 150.0
# 2023-12-02 00:00:00 2.0 20.0 200.0
# 2023-12-02 12:00:00 2.5 25.0 250.0
# 2023-12-03 00:00:00 3.0 30.0 300.0
# 2023-12-03 12:00:00 3.5 35.0 350.0
# 2023-12-04 00:00:00 4.0 40.0 400.0
# 2023-12-04 12:00:00 4.5 45.0 450.0
# 2023-12-05 00:00:00 5.0 50.0 500.0
Note that if a list of strings is specified as a new index, the index of the DataFrame returned by reindex() becomes a normal Index and is no longer time series data. interpolate(method='time') is only possible with a DataFrame with DatetimeIndex.
new_index = ['2023-12-01', '2023-12-02', '2023-12-03']
print(df.reindex(index=new_index))
# A B C
# 2023-12-01 1.0 10.0 100.0
# 2023-12-02 NaN NaN NaN
# 2023-12-03 3.0 30.0 300.0
print(df.reindex(index=new_index).index)
# Index(['2023-12-01', '2023-12-02', '2023-12-03'], dtype='object')
print(df.reindex(index=new_index, method='bfill'))
# A B C
# 2023-12-01 1 10 100
# 2023-12-02 3 30 300
# 2023-12-03 3 30 300
# print(df.reindex(index=new_index).interpolate(method='time'))
# ValueError: time-weighted interpolation only works on Series or DataFrames with a DatetimeIndex
An example of reindex_like() is as follows.
df2 = pd.DataFrame({'A': [1, 1, 1, 1, 1], 'C': [100, 100, 100, 100, 100]},
index=pd.date_range('2023-12-01', '2023-12-05', freq='D'))
print(df2)
# A C
# 2023-12-01 1 100
# 2023-12-02 1 100
# 2023-12-03 1 100
# 2023-12-04 1 100
# 2023-12-05 1 100
print(df.reindex_like(df2))
# A C
# 2023-12-01 1.0 100.0
# 2023-12-02 NaN NaN
# 2023-12-03 3.0 300.0
# 2023-12-04 NaN NaN
# 2023-12-05 5.0 500.0
print(df.reindex_like(df2).interpolate(method='time'))
# A C
# 2023-12-01 1.0 100.0
# 2023-12-02 2.0 200.0
# 2023-12-03 3.0 300.0
# 2023-12-04 4.0 400.0
# 2023-12-05 5.0 500.0