pandas: Concat multiple DataFrame/Series with concat()
The pandas.concat() function allows you to concatenate (join) multiple pandas.DataFrame and pandas.Series objects.
To add new rows and columns to pandas.DataFrame, refer to the following article:
To merge multiple pandas.DataFrame objects based on columns or indexes, use the pandas.merge() function or the merge() and join() methods of pandas.DataFrame.
The sample code in this article uses pandas version 2.0.3. The following pandas.DataFrame and pandas.Series objects are used as examples.
import pandas as pd
print(pd.__version__)
# 2.0.3
df1 = pd.DataFrame({'A': ['A1', 'A2', 'A3'],
'B': ['B1', 'B2', 'B3'],
'C': ['C1', 'C2', 'C3']},
index=['ONE', 'TWO', 'THREE'])
print(df1)
# A B C
# ONE A1 B1 C1
# TWO A2 B2 C2
# THREE A3 B3 C3
df2 = pd.DataFrame({'C': ['C2', 'C3', 'C4'],
'D': ['D2', 'D3', 'D4']},
index=['TWO', 'THREE', 'FOUR'])
print(df2)
# C D
# TWO C2 D2
# THREE C3 D3
# FOUR C4 D4
s1 = pd.Series(['X1', 'X2', 'X3'], index=['ONE', 'TWO', 'THREE'], name='X')
print(s1)
# ONE X1
# TWO X2
# THREE X3
# Name: X, dtype: object
s2 = pd.Series(['Y2', 'Y3', 'Y4'], index=['TWO', 'THREE', 'FOUR'], name='Y')
print(s2)
# TWO Y2
# THREE Y3
# FOUR Y4
# Name: Y, dtype: object
Basic usage of pandas.concat()
Concatenating objects: objs
To concatenate DataFrame and Series objects, pass them as a list or tuple to the first argument, objs. Examples using Series are provided later.
df_concat = pd.concat([df1, df2])
print(df_concat)
# A B C D
# ONE A1 B1 C1 NaN
# TWO A2 B2 C2 NaN
# THREE A3 B3 C3 NaN
# TWO NaN NaN C2 D2
# THREE NaN NaN C3 D3
# FOUR NaN NaN C4 D4
You can concatenate any number of objects.
df_concat_multi = pd.concat([df1, df2, df1])
print(df_concat_multi)
# A B C D
# ONE A1 B1 C1 NaN
# TWO A2 B2 C2 NaN
# THREE A3 B3 C3 NaN
# TWO NaN NaN C2 D2
# THREE NaN NaN C3 D3
# FOUR NaN NaN C4 D4
# ONE A1 B1 C1 NaN
# TWO A2 B2 C2 NaN
# THREE A3 B3 C3 NaN
A new object is generated, and the original object is not changed.
Concatenation direction (vertical or horizontal): axis
Use the axis argument to specify the direction of concatenation, either vertical or horizontal.
If axis=0 (default), the objects are combined vertically.
df_v = pd.concat([df1, df2], axis=0)
print(df_v)
# A B C D
# ONE A1 B1 C1 NaN
# TWO A2 B2 C2 NaN
# THREE A3 B3 C3 NaN
# TWO NaN NaN C2 D2
# THREE NaN NaN C3 D3
# FOUR NaN NaN C4 D4
If axis=1, the objects are combined horizontally.
df_h = pd.concat([df1, df2], axis=1)
print(df_h)
# A B C C D
# ONE A1 B1 C1 NaN NaN
# TWO A2 B2 C2 C2 D2
# THREE A3 B3 C3 C3 D3
# FOUR NaN NaN NaN C4 D4
Concatenation method (outer or inner): join
The join argument determines whether all columns/rows or only those with common names should be retained after concatenation.
With join='outer' (default), all columns/rows are retained. Absent elements in the original object are represented as NaN.
- pandas: Remove NaN (missing values) with dropna()
- pandas: Replace NaN (missing values) with fillna()
With join='inner', only columns/rows with common names are retained.
df_v_out = pd.concat([df1, df2], join='outer')
print(df_v_out)
# A B C D
# ONE A1 B1 C1 NaN
# TWO A2 B2 C2 NaN
# THREE A3 B3 C3 NaN
# TWO NaN NaN C2 D2
# THREE NaN NaN C3 D3
# FOUR NaN NaN C4 D4
df_v_in = pd.concat([df1, df2], join='inner')
print(df_v_in)
# C
# ONE C1
# TWO C2
# THREE C3
# TWO C2
# THREE C3
# FOUR C4
The same join rules apply for horizontal concatenation (axis=1).
df_h_out = pd.concat([df1, df2], axis=1, join='outer')
print(df_h_out)
# A B C C D
# ONE A1 B1 C1 NaN NaN
# TWO A2 B2 C2 C2 D2
# THREE A3 B3 C3 C3 D3
# FOUR NaN NaN NaN C4 D4
df_h_in = pd.concat([df1, df2], axis=1, join='inner')
print(df_h_in)
# A B C C D
# TWO A2 B2 C2 C2 D2
# THREE A3 B3 C3 C3 D3
You can rename the columns or index using the rename() or set_axis() method.
df_h = pd.concat([df1, df2.rename(index={'FOUR': 'ONE'})], axis=1)
print(df_h)
# A B C C D
# ONE A1 B1 C1 C4 D4
# TWO A2 B2 C2 C2 D2
# THREE A3 B3 C3 C3 D3
Concatenate multiple pandas.DataFrame objects
As shown in the previous examples, when concatenating multiple DataFrame objects, a DataFrame is returned.
df_concat = pd.concat([df1, df2])
print(df_concat)
# A B C D
# ONE A1 B1 C1 NaN
# TWO A2 B2 C2 NaN
# THREE A3 B3 C3 NaN
# TWO NaN NaN C2 D2
# THREE NaN NaN C3 D3
# FOUR NaN NaN C4 D4
print(type(df_concat))
# <class 'pandas.core.frame.DataFrame'>
Concatenate multiple pandas.Series objects
When concatenating multiple Series objects, a Series is returned for vertical concatenation (axis=0, default).
s_v = pd.concat([s1, s2])
print(s_v)
# ONE X1
# TWO X2
# THREE X3
# TWO Y2
# THREE Y3
# FOUR Y4
# dtype: object
print(type(s_v))
# <class 'pandas.core.series.Series'>
For horizontal concatenation (axis=1), a DataFrame is returned. You can use the join argument.
s_h = pd.concat([s1, s2], axis=1)
print(s_h)
# X Y
# ONE X1 NaN
# TWO X2 Y2
# THREE X3 Y3
# FOUR NaN Y4
print(type(s_h))
# <class 'pandas.core.frame.DataFrame'>
s_h_in = pd.concat([s1, s2], axis=1, join='inner')
print(s_h_in)
# X Y
# TWO X2 Y2
# THREE X3 Y3
You can also create a DataFrame by concatenating multiple Series using the pandas.DataFrame() constructor. In this case, the Series can also be arranged as rows in the DataFrame. Refer to the following article for details.
Concatenate pandas.DataFrame and pandas.Series
When you concatenate a DataFrame and a Series horizontally (axis=1), the Series becomes a new column, and its name is used as the column name. You can use the join argument.
df_s_h = pd.concat([df1, s2], axis=1)
print(df_s_h)
# A B C Y
# ONE A1 B1 C1 NaN
# TWO A2 B2 C2 Y2
# THREE A3 B3 C3 Y3
# FOUR NaN NaN NaN Y4
df_s_h_in = pd.concat([df1, s2], axis=1, join='inner')
print(df_s_h_in)
# A B C Y
# TWO A2 B2 C2 Y2
# THREE A3 B3 C3 Y3
For vertical concatenation (axis=0, default), the result is as follows:
df_s_v = pd.concat([df1, s1])
print(df_s_v)
# A B C 0
# ONE A1 B1 C1 NaN
# TWO A2 B2 C2 NaN
# THREE A3 B3 C3 NaN
# ONE NaN NaN NaN X1
# TWO NaN NaN NaN X2
# THREE NaN NaN NaN X3
To add a Series as a new row, you could either convert it into a DataFrame before concatenating, or use .loc.
print(s1.set_axis(df1.columns).to_frame().T)
# A B C
# X X1 X2 X3
df_s_v = pd.concat([df1, s1.set_axis(df1.columns).to_frame().T])
print(df_s_v)
# A B C
# ONE A1 B1 C1
# TWO A2 B2 C2
# THREE A3 B3 C3
# X X1 X2 X3
print(s2.values)
# ['Y2' 'Y3' 'Y4']
df1.loc[s2.name] = s2.values
print(df1)
# A B C
# ONE A1 B1 C1
# TWO A2 B2 C2
# THREE A3 B3 C3
# Y Y2 Y3 Y4