pandas: Reorder rows and columns in DataFrame with reindex()
In pandas, the reindex()
method allows you to reorder the rows and columns of a DataFrame
by specifying a list of labels (row and column names). This method is also available on Series
.
- pandas.DataFrame.reindex — pandas 2.1.4 documentation
- pandas.Series.reindex — pandas 2.1.4 documentation
There is also the reindex_like()
method that arranges the rows and columns to be the same as in another DataFrame
or Series
.
- pandas.DataFrame.reindex_like — pandas 2.1.4 documentation
- pandas.Series.reindex_like — pandas 2.1.4 documentation
With both methods, it is possible to not only rearrange existing rows and columns but also to add new ones.
The sample code below demonstrates the use of DataFrame
, but the approach is identical for Series
. Note that Series
does not have columns, so only the index
is specified.
To sort in ascending or descending order, use the sort_index()
method.
The pandas version used in this article is as follows. Note that functionality may vary between versions.
import pandas as pd
print(pd.__version__)
# 2.1.4
Basic usage of reindex()
Consider the following DataFrame
.
df = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30], 'C': [100, 200, 300]},
index=['One', 'Two', 'Three'])
print(df)
# A B C
# One 1 10 100
# Two 2 20 200
# Three 3 30 300
Specify the order of rows and columns: index
, columns
A new, rearranged DataFrame
can be created by specifying lists of row and column names in the index
and columns
arguments, which can be used simultaneously.
print(df.reindex(index=['Two', 'Three', 'One']))
# A B C
# Two 2 20 200
# Three 3 30 300
# One 1 10 100
print(df.reindex(columns=['B', 'C', 'A']))
# B C A
# One 10 100 1
# Two 20 200 2
# Three 30 300 3
print(df.reindex(index=['Two', 'Three', 'One'], columns=['B', 'C', 'A']))
# B C A
# Two 20 200 2
# Three 30 300 3
# One 10 100 1
It is not necessary to use all existing row and column names.
print(df.reindex(columns=['B', 'A'], index=['Three', 'One']))
# B A
# Three 30 3
# One 10 1
Alternatively, you can specify a list in the labels
argument and use the axis
argument to indicate whether they are for rows (0
or 'index'
) or columns (1
or 'columns'
).
print(df.reindex(['Two', 'Three', 'One'], axis=0))
# A B C
# Two 2 20 200
# Three 3 30 300
# One 1 10 100
print(df.reindex(['B', 'C', 'A'], axis='columns'))
# B C A
# One 10 100 1
# Two 20 200 2
# Three 30 300 3
Note that column rearrangement can also be achieved by df[list_of_column_names]
.
print(df[['B', 'C', 'A']])
# B C A
# One 10 100 1
# Two 20 200 2
# Three 30 300 3
Specify new row and column names with reindex()
When new row and column names are passed to reindex()
, all values are filled as missing values (NaN
) by default.
df = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30], 'C': [100, 200, 300]},
index=['One', 'Two', 'Three'])
print(df)
# A B C
# One 1 10 100
# Two 2 20 200
# Three 3 30 300
print(df.reindex(columns=['B', 'X', 'C'], index=['Two', 'One', 'Four']))
# B X C
# Two 20.0 NaN 200.0
# One 10.0 NaN 100.0
# Four NaN NaN NaN
For more information on adding rows and columns to DataFrame
, refer to the following.
Filling with a specific value: fill_value
Specifying a particular value for the fill_value
argument will fill all values with that specified value.
print(df.reindex(columns=['B', 'X', 'C'], index=['Two', 'One', 'Four'],
fill_value=0))
# B X C
# Two 20 0 200
# One 10 0 100
# Four 0 0 0
Fill with adjacent values: method
If the index
of the original DataFrame
is either monotonically increasing or decreasing, you can use the method
argument to fill gaps with adjacent values.
Consider the following DataFrame
.
df = pd.DataFrame({'A': [1, 2], 'B': [10, 20], 'C': [100, 200]},
index=[10, 20])
print(df)
# A B C
# 10 1 10 100
# 20 2 20 200
For the method
argument, using 'bfill'
or 'backfill'
fills with the next value, 'ffill'
or 'pad'
with the previous value, and 'nearest'
with the nearest value, defaulting to the next value if equidistant.
print(df.reindex(index=[5, 10, 15, 20, 25]))
# A B C
# 5 NaN NaN NaN
# 10 1.0 10.0 100.0
# 15 NaN NaN NaN
# 20 2.0 20.0 200.0
# 25 NaN NaN NaN
print(df.reindex(index=[5, 10, 15, 20, 25], method='bfill'))
# A B C
# 5 1.0 10.0 100.0
# 10 1.0 10.0 100.0
# 15 2.0 20.0 200.0
# 20 2.0 20.0 200.0
# 25 NaN NaN NaN
print(df.reindex(index=[5, 10, 15, 20, 25], method='ffill'))
# A B C
# 5 NaN NaN NaN
# 10 1.0 10.0 100.0
# 15 1.0 10.0 100.0
# 20 2.0 20.0 200.0
# 25 2.0 20.0 200.0
print(df.reindex(index=[5, 10, 15, 20, 25], method='nearest'))
# A B C
# 5 1 10 100
# 10 1 10 100
# 15 2 20 200
# 20 2 20 200
# 25 2 20 200
If an integer value is specified for the limit
argument, it will fill only that number of consecutive missing values (NaN
).
print(df.reindex(index=[10, 12, 14, 16, 18, 20]))
# A B C
# 10 1.0 10.0 100.0
# 12 NaN NaN NaN
# 14 NaN NaN NaN
# 16 NaN NaN NaN
# 18 NaN NaN NaN
# 20 2.0 20.0 200.0
print(df.reindex(index=[10, 12, 14, 16, 18, 20], method='bfill', limit=2))
# A B C
# 10 1.0 10.0 100.0
# 12 NaN NaN NaN
# 14 NaN NaN NaN
# 16 2.0 20.0 200.0
# 18 2.0 20.0 200.0
# 20 2.0 20.0 200.0
Note that the filling process uses the index
of the original DataFrame
for reference, and is independent of any order specified in the index
or column
arguments.
print(df.reindex(index=[5, 10, 15, 20, 25], method='bfill'))
# A B C
# 5 1.0 10.0 100.0
# 10 1.0 10.0 100.0
# 15 2.0 20.0 200.0
# 20 2.0 20.0 200.0
# 25 NaN NaN NaN
print(df.reindex(index=[25, 20, 15, 10, 5], method='bfill'))
# A B C
# 25 NaN NaN NaN
# 20 2.0 20.0 200.0
# 15 2.0 20.0 200.0
# 10 1.0 10.0 100.0
# 5 1.0 10.0 100.0
Other methods for filling and interpolation
Other methods for filling missing values (NaN
) include using fillna()
, ffill()
, bfill()
, and for interpolation, interpolate()
.
- pandas: Replace NaN (missing values) with fillna()
- pandas: Interpolate NaN (missing values) with interpolate()
df = pd.DataFrame({'A': [1, 2], 'B': [10, 20], 'C': [100, 200]},
index=[10, 20])
print(df)
# A B C
# 10 1 10 100
# 20 2 20 200
print(df.reindex(index=[5, 10, 15, 20, 25]).bfill())
# A B C
# 5 1.0 10.0 100.0
# 10 1.0 10.0 100.0
# 15 2.0 20.0 200.0
# 20 2.0 20.0 200.0
# 25 NaN NaN NaN
print(df.reindex(index=[5, 10, 15, 20, 25]).interpolate())
# A B C
# 5 NaN NaN NaN
# 10 1.0 10.0 100.0
# 15 1.5 15.0 150.0
# 20 2.0 20.0 200.0
# 25 2.0 20.0 200.0
While reindex()
cannot fill the gaps for added columns from adjacent values, it is possible with fillna()
, ffill()
, bfill()
, and interpolate()
.
print(df.reindex(columns=['A', 'X', 'C'], method='bfill'))
# A X C
# 10 1 NaN 100
# 20 2 NaN 200
print(df.reindex(columns=['A', 'X', 'C']).bfill(axis=1))
# A X C
# 10 1.0 100.0 100.0
# 20 2.0 200.0 200.0
print(df.reindex(columns=['A', 'X', 'C']).interpolate(axis=1))
# A X C
# 10 1.0 50.5 100.0
# 20 2.0 101.0 200.0
As mentioned above, the filling methods of reindex()
are limited to cases where the original index is monotonically increasing or decreasing. The following cases will result in an error.
df = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30], 'C': [100, 200, 300]},
index=[20, 10, 30])
print(df)
# A B C
# 20 1 10 100
# 10 2 20 200
# 30 3 30 300
# print(df.reindex(index=[10, 15, 20], method='bfill'))
# ValueError: index must be monotonic increasing or decreasing
After using reindex()
, you can apply methods like fillna()
, ffill()
, bfill()
, and interpolate()
to fill in missing values (NaN
).
print(df.reindex(index=[10, 15, 20]))
# A B C
# 10 2.0 20.0 200.0
# 15 NaN NaN NaN
# 20 1.0 10.0 100.0
print(df.reindex(index=[10, 15, 20]).bfill())
# A B C
# 10 2.0 20.0 200.0
# 15 1.0 10.0 100.0
# 20 1.0 10.0 100.0
print(df.reindex(index=[10, 15, 20]).interpolate())
# A B C
# 10 2.0 20.0 200.0
# 15 1.5 15.0 150.0
# 20 1.0 10.0 100.0
In this case, the data passes through a DataFrame
containing missing values (NaN
). The data type (dtype
) becomes a floating-point number (float
), so even if all values are integers, as in the above example, the dtype
remains float
.
Basic usage of reindex_like()
reindex_like()
allows you to rearrange the rows and columns of a DataFrame
to be the same as those in another DataFrame
.
Consider the following two DataFrame
.
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30], 'C': [100, 200, 300]},
index=[10, 20, 30])
print(df1)
# A B C
# 10 1 10 100
# 20 2 20 200
# 30 3 30 300
df2 = pd.DataFrame({'A': [1, 1, 1], 'C': [100, 100, 100]},
index=[10, 15, 20])
print(df2)
# A C
# 10 1 100
# 15 1 100
# 20 1 100
Specifying the columns
and index
attributes of the target DataFrame
in the columns
and index
arguments of reindex()
creates a DataFrame
arranged in exactly that order.
print(df1.reindex(columns=df2.columns, index=df2.index))
# A C
# 10 1.0 100.0
# 15 NaN NaN
# 20 2.0 200.0
The same can be achieved more easily using reindex_like()
.
print(df1.reindex_like(df2))
# A C
# 10 1.0 100.0
# 15 NaN NaN
# 20 2.0 200.0
With reindex_like()
, if the index
is monotonically increasing or decreasing, it is possible to specify the method
and limit
arguments. As of pandas version 2.1.4, the fill_value
argument is not implemented, but fillna()
can be used instead.
print(df1.reindex_like(df2, method='bfill'))
# A C
# 10 1 100
# 15 2 200
# 20 2 200
# print(df1.reindex_like(df2, fill_value=0))
# TypeError: NDFrame.reindex_like() got an unexpected keyword argument 'fill_value'
print(df1.reindex_like(df2).fillna(0))
# A C
# 10 1.0 100.0
# 15 0.0 0.0
# 20 2.0 200.0
reindex()
and reindex_like()
for time series
As shown in the official documentation, reindex()
and reindex_like()
are useful for aligning the index
of time series data.
Consider the following DataFrame
. pd.date_range()
is used to generate DatetimeIndex
.
df = pd.DataFrame({'A': [1, 3, 5], 'B': [10, 30, 50], 'C': [100, 300, 500]},
index=pd.date_range('2023-12-01', '2023-12-05', freq='2D'))
print(df)
# A B C
# 2023-12-01 1 10 100
# 2023-12-03 3 30 300
# 2023-12-05 5 50 500
print(df.index)
# DatetimeIndex(['2023-12-01', '2023-12-03', '2023-12-05'], dtype='datetime64[ns]', freq='2D')
For example, when using reindex()
, you can use the interpolate()
method with the method
argument set to 'time'
to enable interpolation based on datetime.
new_dt_index = pd.date_range('2023-12-01', '2023-12-05', freq='12H')
print(new_dt_index)
# DatetimeIndex(['2023-12-01 00:00:00', '2023-12-01 12:00:00',
# '2023-12-02 00:00:00', '2023-12-02 12:00:00',
# '2023-12-03 00:00:00', '2023-12-03 12:00:00',
# '2023-12-04 00:00:00', '2023-12-04 12:00:00',
# '2023-12-05 00:00:00'],
# dtype='datetime64[ns]', freq='12H')
print(df.reindex(index=new_dt_index))
# A B C
# 2023-12-01 00:00:00 1.0 10.0 100.0
# 2023-12-01 12:00:00 NaN NaN NaN
# 2023-12-02 00:00:00 NaN NaN NaN
# 2023-12-02 12:00:00 NaN NaN NaN
# 2023-12-03 00:00:00 3.0 30.0 300.0
# 2023-12-03 12:00:00 NaN NaN NaN
# 2023-12-04 00:00:00 NaN NaN NaN
# 2023-12-04 12:00:00 NaN NaN NaN
# 2023-12-05 00:00:00 5.0 50.0 500.0
print(df.reindex(index=new_dt_index, method='ffill'))
# A B C
# 2023-12-01 00:00:00 1 10 100
# 2023-12-01 12:00:00 1 10 100
# 2023-12-02 00:00:00 1 10 100
# 2023-12-02 12:00:00 1 10 100
# 2023-12-03 00:00:00 3 30 300
# 2023-12-03 12:00:00 3 30 300
# 2023-12-04 00:00:00 3 30 300
# 2023-12-04 12:00:00 3 30 300
# 2023-12-05 00:00:00 5 50 500
print(df.reindex(index=new_dt_index).interpolate(method='time'))
# A B C
# 2023-12-01 00:00:00 1.0 10.0 100.0
# 2023-12-01 12:00:00 1.5 15.0 150.0
# 2023-12-02 00:00:00 2.0 20.0 200.0
# 2023-12-02 12:00:00 2.5 25.0 250.0
# 2023-12-03 00:00:00 3.0 30.0 300.0
# 2023-12-03 12:00:00 3.5 35.0 350.0
# 2023-12-04 00:00:00 4.0 40.0 400.0
# 2023-12-04 12:00:00 4.5 45.0 450.0
# 2023-12-05 00:00:00 5.0 50.0 500.0
Note that if a list of strings is specified as a new index, the index
of the DataFrame
returned by reindex()
becomes a normal Index
and is no longer time series data. interpolate(method='time')
is only possible with a DataFrame
with DatetimeIndex
.
new_index = ['2023-12-01', '2023-12-02', '2023-12-03']
print(df.reindex(index=new_index))
# A B C
# 2023-12-01 1.0 10.0 100.0
# 2023-12-02 NaN NaN NaN
# 2023-12-03 3.0 30.0 300.0
print(df.reindex(index=new_index).index)
# Index(['2023-12-01', '2023-12-02', '2023-12-03'], dtype='object')
print(df.reindex(index=new_index, method='bfill'))
# A B C
# 2023-12-01 1 10 100
# 2023-12-02 3 30 300
# 2023-12-03 3 30 300
# print(df.reindex(index=new_index).interpolate(method='time'))
# ValueError: time-weighted interpolation only works on Series or DataFrames with a DatetimeIndex
An example of reindex_like()
is as follows.
df2 = pd.DataFrame({'A': [1, 1, 1, 1, 1], 'C': [100, 100, 100, 100, 100]},
index=pd.date_range('2023-12-01', '2023-12-05', freq='D'))
print(df2)
# A C
# 2023-12-01 1 100
# 2023-12-02 1 100
# 2023-12-03 1 100
# 2023-12-04 1 100
# 2023-12-05 1 100
print(df.reindex_like(df2))
# A C
# 2023-12-01 1.0 100.0
# 2023-12-02 NaN NaN
# 2023-12-03 3.0 300.0
# 2023-12-04 NaN NaN
# 2023-12-05 5.0 500.0
print(df.reindex_like(df2).interpolate(method='time'))
# A C
# 2023-12-01 1.0 100.0
# 2023-12-02 2.0 200.0
# 2023-12-03 3.0 300.0
# 2023-12-04 4.0 400.0
# 2023-12-05 5.0 500.0