Convert between pandas DataFrame/Series and Python list
This article explains how to convert between pandas DataFrame
/Series
and Python built-in lists.
Although the term "convert" is used for simplicity, the process actually involves creating a new object of a different type, while the original object remains unchanged.
For conversions between DataFrame
/Series
and NumPy arrays (ndarray
), as well as between DataFrame
and Series
, refer to the following articles.
- Convert between pandas DataFrame/Series and NumPy array
- pandas: Convert between DataFrame and Series
The pandas version used in this article is as follows. Note that functionality may vary between versions.
import pandas as pd
print(pd.__version__)
# 2.1.4
Convert lists to DataFrame
and Series
Convert lists to DataFrame
and Series
using pd.DataFrame()
and pd.Series()
By specifying a list as the first argument in the pd.Series()
or pd.DataFrame()
constructors, a Series
or DataFrame
is generated from the list.
l_1d = [0, 10, 20]
print(pd.Series(l_1d))
# 0 0
# 1 10
# 2 20
# dtype: int64
l_2d = [[0, 10, 20], [30, 40, 50]]
print(pd.DataFrame(l_2d))
# 0 1 2
# 0 0 10 20
# 1 30 40 50
Specifying a one-dimensional list directly in pd.DataFrame()
creates a single-column DataFrame
. When specified as [one_dimensional_list]
, it creates a single-row DataFrame
.
print(pd.DataFrame(l_1d))
# 0
# 0 0
# 1 10
# 2 20
print(pd.DataFrame([l_1d]))
# 0 1 2
# 0 0 10 20
You can transpose a two-dimensional list (list of lists).
print(pd.DataFrame(zip(*l_2d)))
# 0 1
# 0 0 30
# 1 10 40
# 2 20 50
Specify row and column names: index
, columns
Row names can be specified with the index
argument, and column names with the columns
argument.
print(pd.Series(l_1d, index=['X', 'Y', 'Z']))
# X 0
# Y 10
# Z 20
# dtype: int64
print(pd.DataFrame(l_2d, index=['X', 'Y'], columns=['A', 'B', 'C']))
# A B C
# X 0 10 20
# Y 30 40 50
It is also possible to set or change the index
and columns
after creating a Series
or a DataFrame
.
Specify data type: dtype
The data type (dtype
) of each column in a DataFrame
, as well as that of a Series
, is automatically determined based on the values in the list.
For example, if a column contains a mix of integers (int
) and floating-point numbers (float
), the data type of the column becomes float
, and if it contains a mix of numbers and strings, the data type becomes object
.
l_2d_multi = [[0, 0.0, 'abc', 123, 'abc'], [10, 0.1, 'xyz', 1.23, 100]]
print(pd.DataFrame(l_2d_multi))
# 0 1 2 3 4
# 0 0 0.0 abc 123.00 abc
# 1 10 0.1 xyz 1.23 100
print(pd.DataFrame(l_2d_multi).dtypes)
# 0 int64
# 1 float64
# 2 object
# 3 float64
# 4 object
# dtype: object
It is also possible to specify the data type using the dtype
argument of pd.DataFrame()
or pd.Series()
.
print(pd.DataFrame(l_2d, dtype=float))
# 0 1 2
# 0 0.0 10.0 20.0
# 1 30.0 40.0 50.0
For more details on data types (dtype
) in pandas, refer to the following article.
For lists containing labels
To create a Series
from a list of label-value pairs, first decompose the list into labels
and values
, and then pass these to pd.Series()
.
l_1d_index = [['X', 0], ['Y', 1], ['Z', 2]]
index, values = zip(*l_1d_index)
print(index)
# ('X', 'Y', 'Z')
print(values)
# (0, 1, 2)
print(pd.Series(values, index=index))
# X 0
# Y 1
# Z 2
# dtype: int64
To create a DataFrame
from a list that includes labels and multiple values, first load the entire list into the DataFrame
, and then set the index
using the set_index()
method.
l_2d_index = [['X', 0, 0.0], ['Y', 1, 0.1], ['Z', 2, 0.2]]
df_index = pd.DataFrame(l_2d_index, columns=['idx', 'A', 'B'])
print(df_index)
# idx A B
# 0 X 0 0.0
# 1 Y 1 0.1
# 2 Z 2 0.2
print(df_index.set_index('idx'))
# A B
# idx
# X 0 0.0
# Y 1 0.1
# Z 2 0.2
If the original list also includes column names, use the first row for the columns
argument and the rest of the rows (obtained by slicing) as the first argument.
l_2d_index_columns = [['idx', 'A', 'B'], ['X', 0, 0.0], ['Y', 1, 0.1], ['Z', 2, 0.2]]
df_index_columns = pd.DataFrame(l_2d_index_columns[1:], columns=l_2d_index_columns[0])
print(df_index_columns)
# idx A B
# 0 X 0 0.0
# 1 Y 1 0.1
# 2 Z 2 0.2
print(df_index_columns.set_index('idx'))
# A B
# idx
# X 0 0.0
# Y 1 0.1
# Z 2 0.2
Convert DataFrame
and Series
to lists
Convert Series
to a list using tolist()
or to_list()
Series
can be converted to a list using the tolist()
or to_list()
methods.
- pandas.Series.tolist — pandas 2.1.4 documentation
- pandas.Series.to_list — pandas 2.1.4 documentation
s = pd.Series([0, 10, 20])
print(s)
# 0 0
# 1 10
# 2 20
# dtype: int64
print(s.tolist())
# [0, 10, 20]
print(s.to_list())
# [0, 10, 20]
Convert DataFrame
to a list using values
and tolist()
As of pandas version 2.1.4, DataFrame
does not have the tolist()
or to_list()
methods. To convert a DataFrame
to a list
, first convert it into a NumPy array (ndarray
) using the values
attribute, and then use the tolist()
method of ndarray
.
df = pd.DataFrame([[0, 10, 20], [30, 40, 50]])
print(df)
# 0 1 2
# 0 0 10 20
# 1 30 40 50
print(df.values.tolist())
# [[0, 10, 20], [30, 40, 50]]
Convert Series
and DataFrame
to lists including index
and columns
To keep the index
as part of the list, use the reset_index()
method to reset the index
and turn it into a data column.
s_index = pd.Series([0, 1, 2], index=['X', 'Y', 'Z'])
print(s_index)
# X 0
# Y 1
# Z 2
# dtype: int64
print(s_index.reset_index())
# index 0
# 0 X 0
# 1 Y 1
# 2 Z 2
print(s_index.reset_index().values.tolist())
# [['X', 0], ['Y', 1], ['Z', 2]]
df_index = pd.DataFrame([[0, 1, 2], [3, 4, 5]], index=['A', 'B'], columns=['X', 'Y', 'Z'])
print(df_index)
# X Y Z
# A 0 1 2
# B 3 4 5
print(df_index.reset_index())
# index X Y Z
# 0 A 0 1 2
# 1 B 3 4 5
print(df_index.reset_index().values.tolist())
# [['A', 0, 1, 2], ['B', 3, 4, 5]]
As of version 2.1.4, DataFrame
has no method to reset columns
. To include both index
and columns
in the list, first apply reset_index()
, then transpose using .T
, apply reset_index()
again, and finally revert the transposition with .T
. A more efficient method may exist.
print(df_index.reset_index().T.reset_index().T.values.tolist())
# [['index', 'X', 'Y', 'Z'], ['A', 0, 1, 2], ['B', 3, 4, 5]]
Convert index
and columns
to lists
The index
attribute of Series
, as well as the index
and columns
attributes of DataFrame
, are all of type Index
. They can be converted to lists using the tolist()
or to_list()
methods.
s_index = pd.Series([0, 1, 2], index=['X', 'Y', 'Z'])
print(s_index)
# X 0
# Y 1
# Z 2
# dtype: int64
print(s_index.index)
# Index(['X', 'Y', 'Z'], dtype='object')
print(s_index.index.tolist())
# ['X', 'Y', 'Z']
df_index = pd.DataFrame([[0, 1, 2], [3, 4, 5]], index=['A', 'B'], columns=['X', 'Y', 'Z'])
print(df_index)
# X Y Z
# A 0 1 2
# B 3 4 5
print(df_index.index)
# Index(['A', 'B'], dtype='object')
print(df_index.index.tolist())
# ['A', 'B']
print(df_index.columns)
# Index(['X', 'Y', 'Z'], dtype='object')
print(df_index.columns.tolist())
# ['X', 'Y', 'Z']
Note that an Index
allows direct iteration in a for
loop to extract elements and supports using []
for specific index-based retrieval. Although slicing is possible, modifying elements directly within an Index
is not. Thus, conversion to a list is unnecessary if you only need to access elements.
for i in df_index.columns:
print(i, type(i))
# X <class 'str'>
# Y <class 'str'>
# Z <class 'str'>
print(df_index.columns[0])
# X
print(df_index.columns[:2])
# Index(['X', 'Y'], dtype='object')
# df_index.columns[0] = 'x'
# TypeError: Index does not support mutable operations