note.nkmk.me

Convert pandas.DataFrame, Series and list to each other

Posted: 2019-11-21 / Tags: Python, pandas, List

pandas.DataFrame,pandas.Series and Python's built-in type list can be converted to each other.

This post describes the following contents.

  • Convert list to pandas.DataFrame, pandas.Series
    • For data-only list
    • For list containing data and labels (row / column names)
  • Convert pandas.DataFrame, pandas.Series to list
    • Convert data to list
    • Convert data and label (row / column name) to list
    • Convert labels (row / column names) to list
Sponsored Link

Convert list to pandas.DataFrame, pandas.Series

For data-only list

By passing a list type object to the first argument of each constructor pandas.DataFrame() and pandas.Series(), pandas.DataFrame and pandas.Series are generated based on the list.

An example of generating pandas.Series from a one-dimensional list is as follows. You can also specify a label with the parameter index.

import pandas as pd

l_1d = [0, 1, 2]

s = pd.Series(l_1d)
print(s)
# 0    0
# 1    1
# 2    2
# dtype: int64

s = pd.Series(l_1d, index=['row1', 'row2', 'row3'])
print(s)
# row1    0
# row2    1
# row3    2
# dtype: int64

An example of generating pandas.DataFrame from a two-dimensional list (list of lists) is as follows. You can also specify the row name with the parameter index and the column name with the parametercolumns.

l_2d = [[0, 1, 2], [3, 4, 5]]

df = pd.DataFrame(l_2d)
print(df)
#    0  1  2
# 0  0  1  2
# 1  3  4  5

df = pd.DataFrame(l_2d,
                  index=['row1', 'row2'],
                  columns=['col1', 'col2', 'col3'])
print(df)
#       col1  col2  col3
# row1     0     1     2
# row2     3     4     5

After generating pandas.DataFrame and pandas.Series, you can set and change the row and column names by updating the index and columns attributes.

For list containing data and labels (row / column names)

Here's how to generate pandas.Series from a list of label / value pairs.

Break it down into a list of labels and a list of values and pass them to pandas.Series(). For details of processing using * and zip (), see the following article.

l_1d_index = [['Alice', 0], ['Bob', 1], ['Charlie', 2]]

index, value = zip(*l_1d_index)
print(index)
# ('Alice', 'Bob', 'Charlie')

print(value)
# (0, 1, 2)

s_index = pd.Series(value, index=index)
print(s_index)
# Alice      0
# Bob        1
# Charlie    2
# dtype: int64

Here's how to create a pandas.DataFrame from a list of labels and multiple values.

The list can be decomposed as in the above example of pandas.Series, but it is easier to set the index with the set_index() method after reading the whole list.

l_2d_index = [['Alice', 0, 0.0], ['Bob', 1, 0.1], ['Charlie', 2, 0.2]]

df_index = pd.DataFrame(l_2d_index, columns=['name', 'val1', 'val2'])
print(df_index)
#       name  val1  val2
# 0    Alice     0   0.0
# 1      Bob     1   0.1
# 2  Charlie     2   0.2

df_index_set = df_index.set_index('name')
print(df_index_set)
#          val1  val2
# name               
# Alice       0   0.0
# Bob         1   0.1
# Charlie     2   0.2

If the data type dtype is different for each column as in this example, the optimal dtype for each column is automatically selected.

print(df_index_set.dtypes)
# val1      int64
# val2    float64
# dtype: object

If the original list also contains column names, specify the first line as columns and the second and subsequent lines as the first argument.

l_2d_index_columns = [['name', 'val1', 'val2'], ['Alice', 0, 0.0], ['Bob', 1, 0.1], ['Charlie', 2, 0.2]]

df_index_columns = pd.DataFrame(l_2d_index_columns[1:], columns=l_2d_index_columns[0])
print(df_index_columns)
#       name  val1  val2
# 0    Alice     0   0.0
# 1      Bob     1   0.1
# 2  Charlie     2   0.2

df_index_columns_set = df_index_columns.set_index('name')
print(df_index_columns_set)
#          val1  val2
# name               
# Alice       0   0.0
# Bob         1   0.1
# Charlie     2   0.2

Convert pandas.DataFrame, pandas.Series to list

Convert data to list

Since there is no method to convert pandas.DataFrame, pandas.Series directly to list, first get the NumPy array ndarray with the values attribute, and then use tolist() method to convert to list.

s = pd.Series([0, 1, 2])
print(s)
# 0    0
# 1    1
# 2    2
# dtype: int64

l_1d = s.values.tolist()
print(l_1d)
# [0, 1, 2]
df = pd.DataFrame([[0, 1, 2], [3, 4, 5]])
print(df)
#    0  1  2
# 0  0  1  2
# 1  3  4  5

l_2d = df.values.tolist()
print(l_2d)
# [[0, 1, 2], [3, 4, 5]]

The values attribute does not include labels (row / column names).

s_index = pd.Series([0, 1, 2], index=['row1', 'row2', 'row3'])
print(s_index)
# row1    0
# row2    1
# row3    2
# dtype: int64

l_1d = s_index.values.tolist()
print(l_1d)
# [0, 1, 2]
df_index = pd.DataFrame([[0, 1, 2], [3, 4, 5]],
                        index=['row1', 'row2'],
                        columns=['col1', 'col2', 'col3'])
print(df_index)
#       col1  col2  col3
# row1     0     1     2
# row2     3     4     5

l_2d = df_index.values.tolist()
print(l_2d)
# [[0, 1, 2], [3, 4, 5]]

Convert data and label (row / column name) to list

If you want to keep the label as list data, reset the index with the reset_index() method.

l_1d_index = s_index.reset_index().values.tolist()
print(l_1d_index)
# [['row1', 0], ['row2', 1], ['row3', 2]]

Since there is no method to reset columns, if you want to keep both the row name and column name of pandas.DataFrame as list data, after applying the reset_index() method, transpose it with .T, apply the reset_index() method again, and then restore it with .T.

l_2d_index = df_index.reset_index().values.tolist()
print(l_2d_index)
# [['row1', 0, 1, 2], ['row2', 3, 4, 5]]

l_2d_index_columns = df_index.reset_index().T.reset_index().T.values.tolist()
print(l_2d_index_columns)
# [['index', 'col1', 'col2', 'col3'], ['row1', 0, 1, 2], ['row2', 3, 4, 5]]

Convert labels (row / column names) to list

If you want to convert only the label into list, use index attribute for pandas.Series.

The index attribute is of the Index type (RangeIndex type in the case of the default sequence number) and has a tolist() method.

print(s_index)
# row1    0
# row2    1
# row3    2
# dtype: int64

print(s_index.index)
# Index(['row1', 'row2', 'row3'], dtype='object')

print(type(s_index.index))
# <class 'pandas.core.indexes.base.Index'>

print(s_index.index.tolist())
# ['row1', 'row2', 'row3']

print(type(s_index.index.tolist()))
# <class 'list'>

Similarly for pandas.DataFrame, use the index attribute for row labels and the columns attribute for column labels. Both are of Index type.

print(df_index)
#       col1  col2  col3
# row1     0     1     2
# row2     3     4     5

print(df_index.index)
# Index(['row1', 'row2'], dtype='object')

print(df_index.index.tolist())
# ['row1', 'row2']

print(df_index.columns)
# Index(['col1', 'col2', 'col3'], dtype='object')

print(df_index.columns.tolist())
# ['col1', 'col2', 'col3']

The Index type can be used as it is in for, and the element can be obtained by specifying the position with []. In many cases, there is no need to convert it to list.

You can also use slices, but you can't change elements.

for i in s_index.index:
    print(i, type(i))
# row1 <class 'str'>
# row2 <class 'str'>
# row3 <class 'str'>

print(s_index.index[0])
# row1

print(s_index.index[:2])
# Index(['row1', 'row2'], dtype='object')

# s_index.index[0] = 'ROW1'
# TypeError: Index does not support mutable operations

Use rename () if you want to change the index or columns element.

Sponsored Link
Share

Related Categories

Related Posts