pandas: Convert between DataFrame and Series
This article explains how to convert between pandas.DataFrame and pandas.Series.
While the term "convert" is used for convenience, it actually refers to the process of generating a DataFrame from a Series, or retrieving a column or row of a DataFrame as a Series.
It is important to note, as explained at the end, that the original and the generated or retrieved objects may share memory. Consequently, changing a value in one could affect the other.
For converting DataFrame and Series to and from NumPy arrays (ndarray) and Python's built-in lists, refer to the following articles.
- Convert pandas.DataFrame, Series and numpy.ndarray to each other
- Convert pandas.DataFrame, Series and list to each other
The pandas version used in this article is as follows. Note that functionality may vary between versions.
import pandas as pd
print(pd.__version__)
# 2.1.4
Convert Series to DataFrame
To convert a Series to a DataFrame, use the to_frame() method or the pd.DataFrame() constructor.
to_frame()
The to_frame() method returns a DataFrame with the calling Series as a column. A column name can be specified as the first argument.
s = pd.Series([0, 1, 2], index=['A', 'B', 'C'])
print(s)
# A 0
# B 1
# C 2
# dtype: int64
print(s.to_frame())
# 0
# A 0
# B 1
# C 2
print(s.to_frame('X'))
# X
# A 0
# B 1
# C 2
If the name attribute is set for the Series, it becomes the column name. If a first argument is specified in to_frame(), it takes precedence.
s_name = pd.Series([0, 1, 2], index=['A', 'B', 'C'], name='X')
print(s_name)
# A 0
# B 1
# C 2
# Name: X, dtype: int64
print(s_name.to_frame())
# X
# A 0
# B 1
# C 2
print(s_name.to_frame('Y'))
# Y
# A 0
# B 1
# C 2
pd.DataFrame()
Passing a Series to the pd.DataFrame() constructor creates a DataFrame with the Series as a column, while passing a list of Series creates a DataFrame with the Series as rows.
s = pd.Series([0, 1, 2], index=['A', 'B', 'C'])
print(s)
# A 0
# B 1
# C 2
# dtype: int64
print(pd.DataFrame(s))
# 0
# A 0
# B 1
# C 2
print(pd.DataFrame([s]))
# A B C
# 0 0 1 2
If the name attribute is set for the Series, it becomes the column or row name.
s_name = pd.Series([0, 1, 2], index=['A', 'B', 'C'], name='X')
print(s_name)
# A 0
# B 1
# C 2
# Name: X, dtype: int64
print(pd.DataFrame(s_name))
# X
# A 0
# B 1
# C 2
print(pd.DataFrame([s_name]))
# A B C
# X 0 1 2
Generate DataFrame from multiple Series
A DataFrame can be generated from multiple Series using either the pd.DataFrame() constructor or the pd.concat() function. The following example uses two Series, but the same process applies when using three or more Series.
When indexes are common
An example using the pd.DataFrame() constructor is as follows. Note that implicit type conversion occurs when Series of different data types (dtype) are used as rows.
s1 = pd.Series([0, 1, 2], index=['A', 'B', 'C'])
s2 = pd.Series([0.0, 0.1, 0.2], index=['A', 'B', 'C'])
print(pd.DataFrame({'col1': s1, 'col2': s2}))
# col1 col2
# A 0 0.0
# B 1 0.1
# C 2 0.2
print(pd.DataFrame([s1, s2]))
# A B C
# 0 0.0 1.0 2.0
# 1 0.0 0.1 0.2
You can also use the pd.concat() function.
print(pd.concat([s1, s2], axis=1))
# 0 1
# A 0 0.0
# B 1 0.1
# C 2 0.2
If name attributes are set for the original Series, they will be used as column or row names in the resulting DataFrame. Note that column names must be explicitly provided when using a dictionary to specify the data in the constructor.
s1_name = pd.Series([0, 1, 2], index=['A', 'B', 'C'], name='X')
s2_name = pd.Series([0.0, 0.1, 0.2], index=['A', 'B', 'C'], name='Y')
print(pd.DataFrame({s1_name.name: s1_name, s2_name.name: s2_name}))
# X Y
# A 0 0.0
# B 1 0.1
# C 2 0.2
print(pd.DataFrame([s1_name, s2_name]))
# A B C
# X 0.0 1.0 2.0
# Y 0.0 0.1 0.2
print(pd.concat([s1_name, s2_name], axis=1))
# X Y
# A 0 0.0
# B 1 0.1
# C 2 0.2
When indexes are different
A DataFrame is generated based on the indexes of Series. If Series have different indexes, missing values (NaN) will occur.
s1 = pd.Series([0, 1, 2], index=['A', 'B', 'C'])
s3 = pd.Series([0.1, 0.2, 0.3], index=['B', 'C', 'D'])
print(pd.DataFrame({'col1': s1, 'col3': s3}))
# col1 col3
# A 0.0 NaN
# B 1.0 0.1
# C 2.0 0.2
# D NaN 0.3
print(pd.DataFrame([s1, s3]))
# A B C D
# 0 0.0 1.0 2.0 NaN
# 1 NaN 0.1 0.2 0.3
print(pd.concat([s1, s3], axis=1))
# 0 1
# A 0.0 NaN
# B 1.0 0.1
# C 2.0 0.2
# D NaN 0.3
For handling missing values in pandas, refer to the following article.
Using pd.concat() with join='inner' retains only the common indexes.
print(pd.concat([s1, s3], axis=1, join='inner'))
# 0 1
# B 1 0.1
# C 2 0.2
To change indexes, use methods like set_axis().
print(s3.set_axis(s1.index))
# A 0.1
# B 0.2
# C 0.3
# dtype: float64
print(pd.DataFrame({'col1': s1, 'col3': s3.set_axis(s1.index)}))
# col1 col3
# A 0 0.1
# B 1 0.2
# C 2 0.3
To ignore the indexes, you can specify the Series as a NumPy array (ndarray) using the values attribute. Note that using pd.concat() in this way results in an error.
print(s1.values)
# [0 1 2]
print(type(s1.values))
# <class 'numpy.ndarray'>
print(pd.DataFrame({'col1': s1.values, 'col3': s3.values}))
# col1 col3
# 0 0 0.1
# 1 1 0.2
# 2 2 0.3
print(pd.DataFrame([s1.values, s3.values]))
# 0 1 2
# 0 0.0 1.0 2.0
# 1 0.1 0.2 0.3
# print(pd.concat([s1.values, s3.values], axis=1))
# TypeError: cannot concatenate object of type '<class 'numpy.ndarray'>'; only Series and DataFrame objs are valid
The pd.DataFrame() constructor allows specifying any row and column names with the index and columns arguments.
print(pd.DataFrame([s1.values, s3.values], index=['X', 'Y'], columns=['A', 'B', 'C']))
# A B C
# X 0.0 1.0 2.0
# Y 0.1 0.2 0.3
When the number of values differs
Even when combining Series with differing numbers of values, a DataFrame is generated based on the index. Any missing elements are filled with NaN.
s1 = pd.Series([0, 1, 2], index=['A', 'B', 'C'])
s4 = pd.Series([0.1, 0.3], index=['B', 'D'])
print(pd.DataFrame({'col1': s1, 'col4': s4}))
# col1 col4
# A 0.0 NaN
# B 1.0 0.1
# C 2.0 NaN
# D NaN 0.3
print(pd.DataFrame([s1, s4]))
# A B C D
# 0 0.0 1.0 2.0 NaN
# 1 NaN 0.1 NaN 0.3
print(pd.concat([s1, s4], axis=1))
# 0 1
# A 0.0 NaN
# B 1.0 0.1
# C 2.0 NaN
# D NaN 0.3
print(pd.concat([s1, s4], axis=1, join='inner'))
# 0 1
# B 1 0.1
As mentioned above, use methods like set_axis() to change indexes.
print(pd.DataFrame({'col1': s1, 'col4': s4.set_axis(['A', 'B'])}))
# col1 col4
# A 0 0.1
# B 1 0.3
# C 2 NaN
The behavior of using the values attribute (ndarray) in the constructor varies depending on how it is used. When used as values in a dictionary, it results in an error if the arrays are of different lengths. However, using values as elements in a list is acceptable.
# print(pd.DataFrame({'col1': s1.values, 'col4': s4.values}))
# ValueError: All arrays must be of the same length
print(pd.DataFrame([s1.values, s4.values]))
# 0 1 2
# 0 0.0 1.0 2.0
# 1 0.1 0.3 NaN
Convert DataFrame to Series
Rows and columns of DataFrame can be retrieved as Series using [], loc[], or iloc[]. Refer to the following articles for details.
- pandas: Select rows/columns by index (numbers and names)
- pandas: Get/Set values with loc, iloc, at, iat
Retrieve DataFrame columns as Series
Specifying a column name with [] or loc[], or a column number with iloc[] as a scalar value, retrieves that column as a Series.
df = pd.DataFrame({'col0': [0, 1, 2], 'col1': [3, 4, 5], 'col2': [6, 7, 8]},
index=['row0', 'row1', 'row2'])
print(df)
# col0 col1 col2
# row0 0 3 6
# row1 1 4 7
# row2 2 5 8
print(df['col0'])
# row0 0
# row1 1
# row2 2
# Name: col0, dtype: int64
print(df.loc[:, 'col0'])
# row0 0
# row1 1
# row2 2
# Name: col0, dtype: int64
print(df.iloc[:, 0])
# row0 0
# row1 1
# row2 2
# Name: col0, dtype: int64
With loc[] or iloc[], it is also possible to select specific rows using a list or slice.
print(df.iloc[[0, 2], 0])
# row0 0
# row2 2
# Name: col0, dtype: int64
print(df.iloc[:2, 0])
# row0 0
# row1 1
# Name: col0, dtype: int64
Selecting a single column with a list or slice results in a DataFrame with one column, not a Series.
print(df.loc[:, ['col0']])
# col0
# row0 0
# row1 1
# row2 2
print(df.iloc[:, :1])
# col0
# row0 0
# row1 1
# row2 2
Retrieve DataFrame rows as Series
Specifying a row name with loc[], or a row number with iloc[] as a scalar value, retrieves that row as a Series.
df = pd.DataFrame({'col0': [0, 1, 2], 'col1': [3, 4, 5], 'col2': [6, 7, 8]},
index=['row0', 'row1', 'row2'])
print(df)
# col0 col1 col2
# row0 0 3 6
# row1 1 4 7
# row2 2 5 8
print(df.loc['row0', :])
# col0 0
# col1 3
# col2 6
# Name: row0, dtype: int64
print(df.iloc[0, :])
# col0 0
# col1 3
# col2 6
# Name: row0, dtype: int64
When selecting an entire row, the column specification : can be omitted.
print(df.loc['row0'])
# col0 0
# col1 3
# col2 6
# Name: row0, dtype: int64
print(df.iloc[0])
# col0 0
# col1 3
# col2 6
# Name: row0, dtype: int64
It is also possible to select specific columns using a list or slice.
print(df.iloc[0, [0, 2]])
# col0 0
# col2 6
# Name: row0, dtype: int64
print(df.iloc[0, :2])
# col0 0
# col1 3
# Name: row0, dtype: int64
Selecting a single row with a list or slice results in a DataFrame with one row, not a Series.
print(df.loc[['row0']])
# col0 col1 col2
# row0 0 3 6
print(df.iloc[:1])
# col0 col1 col2
# row0 0 3 6
Pay attention to data types (dtype)
While DataFrame has data types (dtype) for each column, Series has one data type.
Be careful when retrieving a row of a DataFrame as a Series.
For example, retrieving a row from a DataFrame that has columns of integer (int) and floating-point number (float) types as a Series results in a float data type, with the values in the int column converted to float.
df_multi = pd.DataFrame({'col0': [0, 1, 2], 'col1': [0.0, 0.1, 0.2]},
index=['row0', 'row1', 'row2'])
print(df_multi)
# col0 col1
# row0 0 0.0
# row1 1 0.1
# row2 2 0.2
s_row = df_multi.loc['row2']
print(s_row)
# col0 2.0
# col1 0.2
# Name: row2, dtype: float64
If a DataFrame includes columns of type object, retrieving a row as a Series results in an object data type.
df_multi['col2'] = ['a', 'b', 'c']
print(df_multi)
# col0 col1 col2
# row0 0 0.0 a
# row1 1 0.1 b
# row2 2 0.2 c
print(df_multi.dtypes)
# col0 int64
# col1 float64
# col2 object
# dtype: object
s_row = df_multi.loc['row2']
print(s_row)
# col0 2
# col1 0.2
# col2 c
# Name: row2, dtype: object
With the object type, values retain their original types.
print(type(s_row['col0']))
# <class 'numpy.int64'>
print(type(s_row['col1']))
# <class 'numpy.float64'>
print(type(s_row['col2']))
# <class 'str'>
Views and copies
During conversion between DataFrame and Series, the resulting object may either be a view or a copy of the original. A view shares memory with the original object, and changing one affects the other.
Convert Series to DataFrame
to_frame()
The to_frame() method returns a view if possible. A copy can be created with copy().
s = pd.Series([0, 1], index=['A', 'B'])
df = s.to_frame()
s['A'] = 100
print(df)
# 0
# A 100
# B 1
s = pd.Series([0, 1], index=['A', 'B'])
df_copy = s.copy().to_frame()
s['A'] = 100
print(df_copy)
# 0
# A 0
# B 1
pd.DataFrame()
The pd.DataFrame() constructor returns a view by default if possible. Setting the copy argument to True returns a copy.
s = pd.Series([0, 1], index=['A', 'B'])
df = pd.DataFrame(s)
s['A'] = 100
print(df)
# 0
# A 100
# B 1
s = pd.Series([0, 1], index=['A', 'B'])
df_copy = pd.DataFrame(s, copy=True)
s['A'] = 100
print(df_copy)
# 0
# A 0
# B 1
pd.concat()
The pd.concat() function returns a copy by default. Setting the copy argument to False returns a view if possible.
s1 = pd.Series([0, 1], index=['A', 'B'])
s2 = pd.Series([0.0, 0.1], index=['A', 'B'])
df = pd.concat([s1, s2], axis=1)
s1['A'] = 100
print(df)
# 0 1
# A 0 0.0
# B 1 0.1
s1 = pd.Series([0, 1], index=['A', 'B'])
s2 = pd.Series([0.0, 0.1], index=['A', 'B'])
df_copy_false = pd.concat([s1, s2], axis=1, copy=False)
s1['A'] = 100
print(df_copy_false)
# 0 1
# A 100 0.0
# B 1 0.1
Note that setting copy=True in functions like pd.DataFrame() and pd.concat() ensures a copy is made, while copy=False tries to create a view if possible.
Even with copy=False, a copy might be generated instead of a view depending on the memory layout. Be aware that it is not guaranteed that a view will always be created.
Convert DataFrame to Series
Retrieving either a row or a column from a DataFrame as a Series generally results in a view of the original DataFrame.
df = pd.DataFrame({'col0': [0, 1, 2], 'col1': [3, 4, 5], 'col2': [6, 7, 8]},
index=['row0', 'row1', 'row2'])
print(df)
# col0 col1 col2
# row0 0 3 6
# row1 1 4 7
# row2 2 5 8
s = df['col0']
s['row0'] = 10
print(s)
# row0 10
# row1 1
# row2 2
# Name: col0, dtype: int64
print(df)
# col0 col1 col2
# row0 10 3 6
# row1 1 4 7
# row2 2 5 8
Create a copy with copy() to handle separately.
s_copy = df['col1'].copy()
s_copy['row0'] = 100
print(s_copy)
# row0 100
# row1 4
# row2 5
# Name: col1, dtype: int64
print(df)
# col0 col1 col2
# row0 10 3 6
# row1 1 4 7
# row2 2 5 8
When using a list for selection, a copy is created instead of a view.
s_list = df.loc[['row0', 'row2'], 'col2']
s_list['row0'] = 1000
print(s_list)
# row0 1000
# row2 8
# Name: col2, dtype: int64
print(df)
# col0 col1 col2
# row0 10 3 6
# row1 1 4 7
# row2 2 5 8
When selecting a portion of a DataFrame with loc[] or iloc[] to create a new DataFrame, whether a view or a copy is created depends on the type of range specification used, such as scalar values, lists, or slices.