Convert between pandas DataFrame/Series and Python list

Modified: | Tags: Python, pandas, List

This article explains how to convert between pandas DataFrame/Series and Python built-in lists.

Although the term "convert" is used for simplicity, the process actually involves creating a new object of a different type, while the original object remains unchanged.

For conversions between DataFrame/Series and NumPy arrays (ndarray), as well as between DataFrame and Series, refer to the following articles.

The pandas version used in this article is as follows. Note that functionality may vary between versions.

import pandas as pd

print(pd.__version__)
# 2.1.4

Convert lists to DataFrame and Series

Convert lists to DataFrame and Series using pd.DataFrame() and pd.Series()

By specifying a list as the first argument in the pd.Series() or pd.DataFrame() constructors, a Series or DataFrame is generated from the list.

l_1d = [0, 10, 20]

print(pd.Series(l_1d))
# 0     0
# 1    10
# 2    20
# dtype: int64

l_2d = [[0, 10, 20], [30, 40, 50]]

print(pd.DataFrame(l_2d))
#     0   1   2
# 0   0  10  20
# 1  30  40  50

Specifying a one-dimensional list directly in pd.DataFrame() creates a single-column DataFrame. When specified as [one_dimensional_list], it creates a single-row DataFrame.

print(pd.DataFrame(l_1d))
#     0
# 0   0
# 1  10
# 2  20

print(pd.DataFrame([l_1d]))
#    0   1   2
# 0  0  10  20

You can transpose a two-dimensional list (list of lists).

print(pd.DataFrame(zip(*l_2d)))
#     0   1
# 0   0  30
# 1  10  40
# 2  20  50

Specify row and column names: index, columns

Row names can be specified with the index argument, and column names with the columns argument.

print(pd.Series(l_1d, index=['X', 'Y', 'Z']))
# X     0
# Y    10
# Z    20
# dtype: int64

print(pd.DataFrame(l_2d, index=['X', 'Y'], columns=['A', 'B', 'C']))
#     A   B   C
# X   0  10  20
# Y  30  40  50

It is also possible to set or change the index and columns after creating a Series or a DataFrame.

Specify data type: dtype

The data type (dtype) of each column in a DataFrame, as well as that of a Series, is automatically determined based on the values in the list.

For example, if a column contains a mix of integers (int) and floating-point numbers (float), the data type of the column becomes float, and if it contains a mix of numbers and strings, the data type becomes object.

l_2d_multi = [[0, 0.0, 'abc', 123, 'abc'], [10, 0.1, 'xyz', 1.23, 100]]

print(pd.DataFrame(l_2d_multi))
#     0    1    2       3    4
# 0   0  0.0  abc  123.00  abc
# 1  10  0.1  xyz    1.23  100

print(pd.DataFrame(l_2d_multi).dtypes)
# 0      int64
# 1    float64
# 2     object
# 3    float64
# 4     object
# dtype: object

It is also possible to specify the data type using the dtype argument of pd.DataFrame() or pd.Series().

print(pd.DataFrame(l_2d, dtype=float))
#       0     1     2
# 0   0.0  10.0  20.0
# 1  30.0  40.0  50.0

For more details on data types (dtype) in pandas, refer to the following article.

For lists containing labels

To create a Series from a list of label-value pairs, first decompose the list into labels and values, and then pass these to pd.Series().

l_1d_index = [['X', 0], ['Y', 1], ['Z', 2]]

index, values = zip(*l_1d_index)
print(index)
# ('X', 'Y', 'Z')

print(values)
# (0, 1, 2)

print(pd.Series(values, index=index))
# X    0
# Y    1
# Z    2
# dtype: int64

To create a DataFrame from a list that includes labels and multiple values, first load the entire list into the DataFrame, and then set the index using the set_index() method.

l_2d_index = [['X', 0, 0.0], ['Y', 1, 0.1], ['Z', 2, 0.2]]

df_index = pd.DataFrame(l_2d_index, columns=['idx', 'A', 'B'])
print(df_index)
#   idx  A    B
# 0   X  0  0.0
# 1   Y  1  0.1
# 2   Z  2  0.2

print(df_index.set_index('idx'))
#      A    B
# idx        
# X    0  0.0
# Y    1  0.1
# Z    2  0.2

If the original list also includes column names, use the first row for the columns argument and the rest of the rows (obtained by slicing) as the first argument.

l_2d_index_columns = [['idx', 'A', 'B'], ['X', 0, 0.0], ['Y', 1, 0.1], ['Z', 2, 0.2]]

df_index_columns = pd.DataFrame(l_2d_index_columns[1:], columns=l_2d_index_columns[0])
print(df_index_columns)
#   idx  A    B
# 0   X  0  0.0
# 1   Y  1  0.1
# 2   Z  2  0.2

print(df_index_columns.set_index('idx'))
#      A    B
# idx        
# X    0  0.0
# Y    1  0.1
# Z    2  0.2

Convert DataFrame and Series to lists

Convert Series to a list using tolist() or to_list()

Series can be converted to a list using the tolist() or to_list() methods.

s = pd.Series([0, 10, 20])
print(s)
# 0     0
# 1    10
# 2    20
# dtype: int64

print(s.tolist())
# [0, 10, 20]

print(s.to_list())
# [0, 10, 20]

Convert DataFrame to a list using values and tolist()

As of pandas version 2.1.4, DataFrame does not have the tolist() or to_list() methods. To convert a DataFrame to a list, first convert it into a NumPy array (ndarray) using the values attribute, and then use the tolist() method of ndarray.

df = pd.DataFrame([[0, 10, 20], [30, 40, 50]])
print(df)
#     0   1   2
# 0   0  10  20
# 1  30  40  50

print(df.values.tolist())
# [[0, 10, 20], [30, 40, 50]]

Convert Series and DataFrame to lists including index and columns

To keep the index as part of the list, use the reset_index() method to reset the index and turn it into a data column.

s_index = pd.Series([0, 1, 2], index=['X', 'Y', 'Z'])
print(s_index)
# X    0
# Y    1
# Z    2
# dtype: int64

print(s_index.reset_index())
#   index  0
# 0     X  0
# 1     Y  1
# 2     Z  2

print(s_index.reset_index().values.tolist())
# [['X', 0], ['Y', 1], ['Z', 2]]
df_index = pd.DataFrame([[0, 1, 2], [3, 4, 5]], index=['A', 'B'], columns=['X', 'Y', 'Z'])
print(df_index)
#    X  Y  Z
# A  0  1  2
# B  3  4  5

print(df_index.reset_index())
#   index  X  Y  Z
# 0     A  0  1  2
# 1     B  3  4  5

print(df_index.reset_index().values.tolist())
# [['A', 0, 1, 2], ['B', 3, 4, 5]]

As of version 2.1.4, DataFrame has no method to reset columns. To include both index and columns in the list, first apply reset_index(), then transpose using .T, apply reset_index() again, and finally revert the transposition with .T. A more efficient method may exist.

print(df_index.reset_index().T.reset_index().T.values.tolist())
# [['index', 'X', 'Y', 'Z'], ['A', 0, 1, 2], ['B', 3, 4, 5]]

Convert index and columns to lists

The index attribute of Series, as well as the index and columns attributes of DataFrame, are all of type Index. They can be converted to lists using the tolist() or to_list() methods.

s_index = pd.Series([0, 1, 2], index=['X', 'Y', 'Z'])
print(s_index)
# X    0
# Y    1
# Z    2
# dtype: int64

print(s_index.index)
# Index(['X', 'Y', 'Z'], dtype='object')

print(s_index.index.tolist())
# ['X', 'Y', 'Z']
df_index = pd.DataFrame([[0, 1, 2], [3, 4, 5]], index=['A', 'B'], columns=['X', 'Y', 'Z'])
print(df_index)
#    X  Y  Z
# A  0  1  2
# B  3  4  5

print(df_index.index)
# Index(['A', 'B'], dtype='object')

print(df_index.index.tolist())
# ['A', 'B']

print(df_index.columns)
# Index(['X', 'Y', 'Z'], dtype='object')

print(df_index.columns.tolist())
# ['X', 'Y', 'Z']

Note that an Index allows direct iteration in a for loop to extract elements and supports using [] for specific index-based retrieval. Although slicing is possible, modifying elements directly within an Index is not. Thus, conversion to a list is unnecessary if you only need to access elements.

for i in df_index.columns:
    print(i, type(i))
# X <class 'str'>
# Y <class 'str'>
# Z <class 'str'>

print(df_index.columns[0])
# X

print(df_index.columns[:2])
# Index(['X', 'Y'], dtype='object')

# df_index.columns[0] = 'x'
# TypeError: Index does not support mutable operations

Related Categories

Related Articles