pandas: Convert between DataFrame and Series

Posted: | Tags: Python, pandas

This article explains how to convert between pandas.DataFrame and pandas.Series.

While the term "convert" is used for convenience, it actually refers to the process of generating a DataFrame from a Series, or retrieving a column or row of a DataFrame as a Series.

It is important to note, as explained at the end, that the original and the generated or retrieved objects may share memory. Consequently, changing a value in one could affect the other.

For converting DataFrame and Series to and from NumPy arrays (ndarray) and Python's built-in lists, refer to the following articles.

The pandas version used in this article is as follows. Note that functionality may vary between versions.

import pandas as pd

print(pd.__version__)
# 2.1.4

Convert Series to DataFrame

To convert a Series to a DataFrame, use the to_frame() method or the pd.DataFrame() constructor.

to_frame()

The to_frame() method returns a DataFrame with the calling Series as a column. A column name can be specified as the first argument.

s = pd.Series([0, 1, 2], index=['A', 'B', 'C'])
print(s)
# A    0
# B    1
# C    2
# dtype: int64

print(s.to_frame())
#    0
# A  0
# B  1
# C  2

print(s.to_frame('X'))
#    X
# A  0
# B  1
# C  2

If the name attribute is set for the Series, it becomes the column name. If a first argument is specified in to_frame(), it takes precedence.

s_name = pd.Series([0, 1, 2], index=['A', 'B', 'C'], name='X')
print(s_name)
# A    0
# B    1
# C    2
# Name: X, dtype: int64

print(s_name.to_frame())
#    X
# A  0
# B  1
# C  2

print(s_name.to_frame('Y'))
#    Y
# A  0
# B  1
# C  2

pd.DataFrame()

Passing a Series to the pd.DataFrame() constructor creates a DataFrame with the Series as a column, while passing a list of Series creates a DataFrame with the Series as rows.

s = pd.Series([0, 1, 2], index=['A', 'B', 'C'])
print(s)
# A    0
# B    1
# C    2
# dtype: int64

print(pd.DataFrame(s))
#    0
# A  0
# B  1
# C  2

print(pd.DataFrame([s]))
#    A  B  C
# 0  0  1  2

If the name attribute is set for the Series, it becomes the column or row name.

s_name = pd.Series([0, 1, 2], index=['A', 'B', 'C'], name='X')
print(s_name)
# A    0
# B    1
# C    2
# Name: X, dtype: int64

print(pd.DataFrame(s_name))
#    X
# A  0
# B  1
# C  2

print(pd.DataFrame([s_name]))
#    A  B  C
# X  0  1  2

Generate DataFrame from multiple Series

A DataFrame can be generated from multiple Series using either the pd.DataFrame() constructor or the pd.concat() function. The following example uses two Series, but the same process applies when using three or more Series.

When indexes are common

An example using the pd.DataFrame() constructor is as follows. Note that implicit type conversion occurs when Series of different data types (dtype) are used as rows.

s1 = pd.Series([0, 1, 2], index=['A', 'B', 'C'])
s2 = pd.Series([0.0, 0.1, 0.2], index=['A', 'B', 'C'])

print(pd.DataFrame({'col1': s1, 'col2': s2}))
#    col1  col2
# A     0   0.0
# B     1   0.1
# C     2   0.2

print(pd.DataFrame([s1, s2]))
#      A    B    C
# 0  0.0  1.0  2.0
# 1  0.0  0.1  0.2

You can also use the pd.concat() function.

print(pd.concat([s1, s2], axis=1))
#    0    1
# A  0  0.0
# B  1  0.1
# C  2  0.2

If name attributes are set for the original Series, they will be used as column or row names in the resulting DataFrame. Note that column names must be explicitly provided when using a dictionary to specify the data in the constructor.

s1_name = pd.Series([0, 1, 2], index=['A', 'B', 'C'], name='X')
s2_name = pd.Series([0.0, 0.1, 0.2], index=['A', 'B', 'C'], name='Y')

print(pd.DataFrame({s1_name.name: s1_name, s2_name.name: s2_name}))
#    X    Y
# A  0  0.0
# B  1  0.1
# C  2  0.2

print(pd.DataFrame([s1_name, s2_name]))
#      A    B    C
# X  0.0  1.0  2.0
# Y  0.0  0.1  0.2

print(pd.concat([s1_name, s2_name], axis=1))
#    X    Y
# A  0  0.0
# B  1  0.1
# C  2  0.2

When indexes are different

A DataFrame is generated based on the indexes of Series. If Series have different indexes, missing values (NaN) will occur.

s1 = pd.Series([0, 1, 2], index=['A', 'B', 'C'])
s3 = pd.Series([0.1, 0.2, 0.3], index=['B', 'C', 'D'])

print(pd.DataFrame({'col1': s1, 'col3': s3}))
#    col1  col3
# A   0.0   NaN
# B   1.0   0.1
# C   2.0   0.2
# D   NaN   0.3

print(pd.DataFrame([s1, s3]))
#      A    B    C    D
# 0  0.0  1.0  2.0  NaN
# 1  NaN  0.1  0.2  0.3

print(pd.concat([s1, s3], axis=1))
#      0    1
# A  0.0  NaN
# B  1.0  0.1
# C  2.0  0.2
# D  NaN  0.3

For handling missing values in pandas, refer to the following article.

Using pd.concat() with join='inner' retains only the common indexes.

print(pd.concat([s1, s3], axis=1, join='inner'))
#    0    1
# B  1  0.1
# C  2  0.2

To change indexes, use methods like set_axis().

print(s3.set_axis(s1.index))
# A    0.1
# B    0.2
# C    0.3
# dtype: float64

print(pd.DataFrame({'col1': s1, 'col3': s3.set_axis(s1.index)}))
#    col1  col3
# A     0   0.1
# B     1   0.2
# C     2   0.3

To ignore the indexes, you can specify the Series as a NumPy array (ndarray) using the values attribute. Note that using pd.concat() in this way results in an error.

print(s1.values)
# [0 1 2]

print(type(s1.values))
# <class 'numpy.ndarray'>

print(pd.DataFrame({'col1': s1.values, 'col3': s3.values}))
#    col1  col3
# 0     0   0.1
# 1     1   0.2
# 2     2   0.3

print(pd.DataFrame([s1.values, s3.values]))
#      0    1    2
# 0  0.0  1.0  2.0
# 1  0.1  0.2  0.3

# print(pd.concat([s1.values, s3.values], axis=1))
# TypeError: cannot concatenate object of type '<class 'numpy.ndarray'>'; only Series and DataFrame objs are valid

The pd.DataFrame() constructor allows specifying any row and column names with the index and columns arguments.

print(pd.DataFrame([s1.values, s3.values], index=['X', 'Y'], columns=['A', 'B', 'C']))
#      A    B    C
# X  0.0  1.0  2.0
# Y  0.1  0.2  0.3

When the number of values differs

Even when combining Series with differing numbers of values, a DataFrame is generated based on the index. Any missing elements are filled with NaN.

s1 = pd.Series([0, 1, 2], index=['A', 'B', 'C'])
s4 = pd.Series([0.1, 0.3], index=['B', 'D'])

print(pd.DataFrame({'col1': s1, 'col4': s4}))
#    col1  col4
# A   0.0   NaN
# B   1.0   0.1
# C   2.0   NaN
# D   NaN   0.3

print(pd.DataFrame([s1, s4]))
#      A    B    C    D
# 0  0.0  1.0  2.0  NaN
# 1  NaN  0.1  NaN  0.3

print(pd.concat([s1, s4], axis=1))
#      0    1
# A  0.0  NaN
# B  1.0  0.1
# C  2.0  NaN
# D  NaN  0.3

print(pd.concat([s1, s4], axis=1, join='inner'))
#    0    1
# B  1  0.1

As mentioned above, use methods like set_axis() to change indexes.

print(pd.DataFrame({'col1': s1, 'col4': s4.set_axis(['A', 'B'])}))
#    col1  col4
# A     0   0.1
# B     1   0.3
# C     2   NaN

The behavior of using the values attribute (ndarray) in the constructor varies depending on how it is used. When used as values in a dictionary, it results in an error if the arrays are of different lengths. However, using values as elements in a list is acceptable.

# print(pd.DataFrame({'col1': s1.values, 'col4': s4.values}))
# ValueError: All arrays must be of the same length

print(pd.DataFrame([s1.values, s4.values]))
#      0    1    2
# 0  0.0  1.0  2.0
# 1  0.1  0.3  NaN

Convert DataFrame to Series

Rows and columns of DataFrame can be retrieved as Series using [], loc[], or iloc[]. Refer to the following articles for details.

Retrieve DataFrame columns as Series

Specifying a column name with [] or loc[], or a column number with iloc[] as a scalar value, retrieves that column as a Series.

df = pd.DataFrame({'col0': [0, 1, 2], 'col1': [3, 4, 5], 'col2': [6, 7, 8]},
                  index=['row0', 'row1', 'row2'])
print(df)
#       col0  col1  col2
# row0     0     3     6
# row1     1     4     7
# row2     2     5     8

print(df['col0'])
# row0    0
# row1    1
# row2    2
# Name: col0, dtype: int64

print(df.loc[:, 'col0'])
# row0    0
# row1    1
# row2    2
# Name: col0, dtype: int64

print(df.iloc[:, 0])
# row0    0
# row1    1
# row2    2
# Name: col0, dtype: int64

With loc[] or iloc[], it is also possible to select specific rows using a list or slice.

print(df.iloc[[0, 2], 0])
# row0    0
# row2    2
# Name: col0, dtype: int64

print(df.iloc[:2, 0])
# row0    0
# row1    1
# Name: col0, dtype: int64

Selecting a single column with a list or slice results in a DataFrame with one column, not a Series.

print(df.loc[:, ['col0']])
#       col0
# row0     0
# row1     1
# row2     2

print(df.iloc[:, :1])
#       col0
# row0     0
# row1     1
# row2     2

Retrieve DataFrame rows as Series

Specifying a row name with loc[], or a row number with iloc[] as a scalar value, retrieves that row as a Series.

df = pd.DataFrame({'col0': [0, 1, 2], 'col1': [3, 4, 5], 'col2': [6, 7, 8]},
                  index=['row0', 'row1', 'row2'])
print(df)
#       col0  col1  col2
# row0     0     3     6
# row1     1     4     7
# row2     2     5     8

print(df.loc['row0', :])
# col0    0
# col1    3
# col2    6
# Name: row0, dtype: int64

print(df.iloc[0, :])
# col0    0
# col1    3
# col2    6
# Name: row0, dtype: int64

When selecting an entire row, the column specification : can be omitted.

print(df.loc['row0'])
# col0    0
# col1    3
# col2    6
# Name: row0, dtype: int64

print(df.iloc[0])
# col0    0
# col1    3
# col2    6
# Name: row0, dtype: int64

It is also possible to select specific columns using a list or slice.

print(df.iloc[0, [0, 2]])
# col0    0
# col2    6
# Name: row0, dtype: int64

print(df.iloc[0, :2])
# col0    0
# col1    3
# Name: row0, dtype: int64

Selecting a single row with a list or slice results in a DataFrame with one row, not a Series.

print(df.loc[['row0']])
#       col0  col1  col2
# row0     0     3     6

print(df.iloc[:1])
#       col0  col1  col2
# row0     0     3     6

Pay attention to data types (dtype)

While DataFrame has data types (dtype) for each column, Series has one data type.

Be careful when retrieving a row of a DataFrame as a Series.

For example, retrieving a row from a DataFrame that has columns of integer (int) and floating-point number (float) types as a Series results in a float data type, with the values in the int column converted to float.

df_multi = pd.DataFrame({'col0': [0, 1, 2], 'col1': [0.0, 0.1, 0.2]},
                        index=['row0', 'row1', 'row2'])
print(df_multi)
#       col0  col1
# row0     0   0.0
# row1     1   0.1
# row2     2   0.2

s_row = df_multi.loc['row2']
print(s_row)
# col0    2.0
# col1    0.2
# Name: row2, dtype: float64

If a DataFrame includes columns of type object, retrieving a row as a Series results in an object data type.

df_multi['col2'] = ['a', 'b', 'c']
print(df_multi)
#       col0  col1 col2
# row0     0   0.0    a
# row1     1   0.1    b
# row2     2   0.2    c

print(df_multi.dtypes)
# col0      int64
# col1    float64
# col2     object
# dtype: object

s_row = df_multi.loc['row2']
print(s_row)
# col0      2
# col1    0.2
# col2      c
# Name: row2, dtype: object

With the object type, values retain their original types.

print(type(s_row['col0']))
# <class 'numpy.int64'>

print(type(s_row['col1']))
# <class 'numpy.float64'>

print(type(s_row['col2']))
# <class 'str'>

Views and copies

During conversion between DataFrame and Series, the resulting object may either be a view or a copy of the original. A view shares memory with the original object, and changing one affects the other.

Convert Series to DataFrame

to_frame()

The to_frame() method returns a view if possible. A copy can be created with copy().

s = pd.Series([0, 1], index=['A', 'B'])
df = s.to_frame()

s['A'] = 100
print(df)
#      0
# A  100
# B    1

s = pd.Series([0, 1], index=['A', 'B'])
df_copy = s.copy().to_frame()

s['A'] = 100
print(df_copy)
#    0
# A  0
# B  1

pd.DataFrame()

The pd.DataFrame() constructor returns a view by default if possible. Setting the copy argument to True returns a copy.

s = pd.Series([0, 1], index=['A', 'B'])
df = pd.DataFrame(s)

s['A'] = 100
print(df)
#      0
# A  100
# B    1

s = pd.Series([0, 1], index=['A', 'B'])
df_copy = pd.DataFrame(s, copy=True)

s['A'] = 100
print(df_copy)
#    0
# A  0
# B  1

pd.concat()

The pd.concat() function returns a copy by default. Setting the copy argument to False returns a view if possible.

s1 = pd.Series([0, 1], index=['A', 'B'])
s2 = pd.Series([0.0, 0.1], index=['A', 'B'])
df = pd.concat([s1, s2], axis=1)

s1['A'] = 100
print(df)
#    0    1
# A  0  0.0
# B  1  0.1

s1 = pd.Series([0, 1], index=['A', 'B'])
s2 = pd.Series([0.0, 0.1], index=['A', 'B'])
df_copy_false = pd.concat([s1, s2], axis=1, copy=False)

s1['A'] = 100
print(df_copy_false)
#      0    1
# A  100  0.0
# B    1  0.1

Note that setting copy=True in functions like pd.DataFrame() and pd.concat() ensures a copy is made, while copy=False tries to create a view if possible.

Even with copy=False, a copy might be generated instead of a view depending on the memory layout. Be aware that it is not guaranteed that a view will always be created.

Convert DataFrame to Series

Retrieving either a row or a column from a DataFrame as a Series generally results in a view of the original DataFrame.

df = pd.DataFrame({'col0': [0, 1, 2], 'col1': [3, 4, 5], 'col2': [6, 7, 8]},
                  index=['row0', 'row1', 'row2'])
print(df)
#       col0  col1  col2
# row0     0     3     6
# row1     1     4     7
# row2     2     5     8

s = df['col0']
s['row0'] = 10
print(s)
# row0    10
# row1     1
# row2     2
# Name: col0, dtype: int64

print(df)
#       col0  col1  col2
# row0    10     3     6
# row1     1     4     7
# row2     2     5     8

Create a copy with copy() to handle separately.

s_copy = df['col1'].copy()
s_copy['row0'] = 100
print(s_copy)
# row0    100
# row1      4
# row2      5
# Name: col1, dtype: int64

print(df)
#       col0  col1  col2
# row0    10     3     6
# row1     1     4     7
# row2     2     5     8

When using a list for selection, a copy is created instead of a view.

s_list = df.loc[['row0', 'row2'], 'col2']
s_list['row0'] = 1000
print(s_list)
# row0    1000
# row2       8
# Name: col2, dtype: int64

print(df)
#       col0  col1  col2
# row0    10     3     6
# row1     1     4     7
# row2     2     5     8

When selecting a portion of a DataFrame with loc[] or iloc[] to create a new DataFrame, whether a view or a copy is created depends on the type of range specification used, such as scalar values, lists, or slices.

Related Categories

Related Articles