pandas.DataFrameとSeriesを相互に変換

Modified: 2023-12-17 | Tags: Python, pandas

pandasのDataFrameとSeriesを相互に変換する方法を説明する。

便宜上「変換」という言葉を使っているが、実際は、SeriesからDataFrameを生成したり、DataFrameの列や行をSeriesとして取得したりする処理となる。

目次

SeriesをDataFrameに変換
- to_frame()
- pd.DataFrame()
複数のSeriesからDataFrameを生成
DataFrameをSeriesに変換
- DataFrameの列をSeriesとして取得
- DataFrameの行をSeriesとして取得
ビューとコピー（メモリの共有）
- SeriesをDataFrameに変換する場合
- DataFrameをSeriesに変換する場合

最後に説明するように、元のオブジェクトと生成・取得したオブジェクトがメモリを共有し、一方の要素を変更すると他方の要素も変更される場合があるので注意。

DataFrameおよびSeriesをNumPy配列ndarrayやPython組み込みのリストlistと相互に変換する方法については以下の記事を参照。

関連記事: pandas.DataFrame, SeriesとNumPy配列ndarrayを相互に変換
関連記事: pandas.DataFrame, SeriesとPythonのリストを相互に変換

本記事のサンプルコードのpandasのバージョンは以下の通り。バージョンによって仕様が異なる可能性があるので注意。

import pandas as pd

print(pd.__version__)
# 2.1.4

source: pandas_series_to_dataframe.py

SeriesをDataFrameに変換

SeriesをDataFrameに変換するには、Seriesのto_frame()メソッドかコンストラクタpd.DataFrame()を使う。

to_frame()

to_frame()メソッドは呼び出し元のSeriesを列とするDataFrameを返す。第一引数に列名を指定できる。

pandas.Series.to_frame — pandas 2.1.4 documentation

s = pd.Series([0, 1, 2], index=['A', 'B', 'C'])
print(s)
# A    0
# B    1
# C    2
# dtype: int64

print(s.to_frame())
#    0
# A  0
# B  1
# C  2

print(s.to_frame('X'))
#    X
# A  0
# B  1
# C  2

source: pandas_series_to_dataframe.py

Seriesにname属性が設定されているとname属性が列名となる。to_frame()の第一引数を指定すると第一引数が優先される。

s_name = pd.Series([0, 1, 2], index=['A', 'B', 'C'], name='X')
print(s_name)
# A    0
# B    1
# C    2
# Name: X, dtype: int64

print(s_name.to_frame())
#    X
# A  0
# B  1
# C  2

print(s_name.to_frame('Y'))
#    Y
# A  0
# B  1
# C  2

source: pandas_series_to_dataframe.py

pd.DataFrame()

コンストラクタpd.DataFrame()にSeriesを渡すとSeriesを列とするDataFrame、Seriesを要素とするリストを渡すとSeriesを行とするDataFrameが生成される。

pandas.DataFrame — pandas 2.1.4 documentation

s = pd.Series([0, 1, 2], index=['A', 'B', 'C'])
print(s)
# A    0
# B    1
# C    2
# dtype: int64

print(pd.DataFrame(s))
#    0
# A  0
# B  1
# C  2

print(pd.DataFrame([s]))
#    A  B  C
# 0  0  1  2

source: pandas_series_to_dataframe.py

Seriesのname属性が設定されているとname属性が列名・行名となる。

s_name = pd.Series([0, 1, 2], index=['A', 'B', 'C'], name='X')
print(s_name)
# A    0
# B    1
# C    2
# Name: X, dtype: int64

print(pd.DataFrame(s_name))
#    X
# A  0
# B  1
# C  2

print(pd.DataFrame([s_name]))
#    A  B  C
# X  0  1  2

source: pandas_series_to_dataframe.py

複数のSeriesからDataFrameを生成

複数のSeriesからDataFrameを生成することも可能。以下の例では2つSeriesの場合を示すが、3つ以上のSeriesでも同様に処理できる。

コンストラクタpd.DataFrame()かpd.concat()関数を使う

インデックスが共通の場合

コンストラクタpd.DataFrame()を使う例は以下の通り。異なるデータ型dtypeのSeriesを行とする場合は暗黙の型変換が行われるので注意。

関連記事: pandas.DataFrameの構造とその作成方法

s1 = pd.Series([0, 1, 2], index=['A', 'B', 'C'])
s2 = pd.Series([0.0, 0.1, 0.2], index=['A', 'B', 'C'])

print(pd.DataFrame({'col1': s1, 'col2': s2}))
#    col1  col2
# A     0   0.0
# B     1   0.1
# C     2   0.2

print(pd.DataFrame([s1, s2]))
#      A    B    C
# 0  0.0  1.0  2.0
# 1  0.0  0.1  0.2

source: pandas_series_to_dataframe.py

pd.concat()関数を利用する方法もある。

関連記事: pandas.DataFrame, Seriesを連結するconcat

print(pd.concat([s1, s2], axis=1))
#    0    1
# A  0  0.0
# B  1  0.1
# C  2  0.2

source: pandas_series_to_dataframe.py

元のSeriesにname属性が設定されていると以下のようになる。コンストラクタに辞書で指定する場合は明示的に列名を指定する必要がある。

s1_name = pd.Series([0, 1, 2], index=['A', 'B', 'C'], name='X')
s2_name = pd.Series([0.0, 0.1, 0.2], index=['A', 'B', 'C'], name='Y')

print(pd.DataFrame({s1_name.name: s1_name, s2_name.name: s2_name}))
#    X    Y
# A  0  0.0
# B  1  0.1
# C  2  0.2

print(pd.DataFrame([s1_name, s2_name]))
#      A    B    C
# X  0.0  1.0  2.0
# Y  0.0  0.1  0.2

print(pd.concat([s1_name, s2_name], axis=1))
#    X    Y
# A  0  0.0
# B  1  0.1
# C  2  0.2

source: pandas_series_to_dataframe.py

インデックスが異なる場合

DataFrameはSeriesのインデックスを基準に生成される。Seriesが異なるインデックスを持つ場合は欠損値NaNが生じる。

s1 = pd.Series([0, 1, 2], index=['A', 'B', 'C'])
s3 = pd.Series([0.1, 0.2, 0.3], index=['B', 'C', 'D'])

print(pd.DataFrame({'col1': s1, 'col3': s3}))
#    col1  col3
# A   0.0   NaN
# B   1.0   0.1
# C   2.0   0.2
# D   NaN   0.3

print(pd.DataFrame([s1, s3]))
#      A    B    C    D
# 0  0.0  1.0  2.0  NaN
# 1  NaN  0.1  0.2  0.3

print(pd.concat([s1, s3], axis=1))
#      0    1
# A  0.0  NaN
# B  1.0  0.1
# C  2.0  0.2
# D  NaN  0.3

source: pandas_series_to_dataframe.py

pandasにおける欠損値の処理については以下の記事を参照。

関連記事: pandasで欠損値NaNを除外（削除）・置換（穴埋め）・抽出

pd.concat()でjoin='inner'とすると共通するインデックスのみが残る。

関連記事: pandas.DataFrame, Seriesを連結するconcat

print(pd.concat([s1, s3], axis=1, join='inner'))
#    0    1
# B  1  0.1
# C  2  0.2

source: pandas_series_to_dataframe.py

インデックスを変更するにはset_axis()などを使う。インデックスを揃えることが可能。

関連記事: pandas.DataFrameの行名・列名の変更

print(s3.set_axis(s1.index))
# A    0.1
# B    0.2
# C    0.3
# dtype: float64

print(pd.DataFrame({'col1': s1, 'col3': s3.set_axis(s1.index)}))
#    col1  col3
# A     0   0.1
# B     1   0.2
# C     2   0.3

source: pandas_series_to_dataframe.py

インデックスを無視したい場合、Seriesのvalues属性でNumPy配列ndarrayとして指定する方法がある。pd.concat()ではエラーになるので注意。

print(s1.values)
# [0 1 2]

print(type(s1.values))
# <class 'numpy.ndarray'>

print(pd.DataFrame({'col1': s1.values, 'col3': s3.values}))
#    col1  col3
# 0     0   0.1
# 1     1   0.2
# 2     2   0.3

print(pd.DataFrame([s1.values, s3.values]))
#      0    1    2
# 0  0.0  1.0  2.0
# 1  0.1  0.2  0.3

# print(pd.concat([s1.values, s3.values], axis=1))
# TypeError: cannot concatenate object of type '<class 'numpy.ndarray'>'; only Series and DataFrame objs are valid

source: pandas_series_to_dataframe.py

pd.DataFrame()の引数index, columnsで任意の行名・列名を指定できる。

print(pd.DataFrame([s1.values, s3.values], index=['X', 'Y'], columns=['A', 'B', 'C']))
#      A    B    C
# X  0.0  1.0  2.0
# Y  0.1  0.2  0.3

source: pandas_series_to_dataframe.py

要素数が異なる場合

要素数が異なるSeriesの場合も、インデックスindexを基準にDataFrameが生成される。足りない分はNaNとなる。

s1 = pd.Series([0, 1, 2], index=['A', 'B', 'C'])
s4 = pd.Series([0.1, 0.3], index=['B', 'D'])

print(pd.DataFrame({'col1': s1, 'col4': s4}))
#    col1  col4
# A   0.0   NaN
# B   1.0   0.1
# C   2.0   NaN
# D   NaN   0.3

print(pd.DataFrame([s1, s4]))
#      A    B    C    D
# 0  0.0  1.0  2.0  NaN
# 1  NaN  0.1  NaN  0.3

print(pd.concat([s1, s4], axis=1))
#      0    1
# A  0.0  NaN
# B  1.0  0.1
# C  2.0  NaN
# D  NaN  0.3

print(pd.concat([s1, s4], axis=1, join='inner'))
#    0    1
# B  1  0.1

source: pandas_series_to_dataframe.py

上述のように、インデックスを変更するにはset_axis()などを使う。任意のインデックスを設定できる。

print(pd.DataFrame({'col1': s1, 'col4': s4.set_axis(['A', 'B'])}))
#    col1  col4
# A     0   0.1
# B     1   0.3
# C     2   NaN

source: pandas_series_to_dataframe.py

values属性（ndarray）に対する振る舞いはコンストラクタへの指定方法によって異なる。

# print(pd.DataFrame({'col1': s1.values, 'col4': s4.values}))
# ValueError: All arrays must be of the same length

print(pd.DataFrame([s1.values, s4.values]))
#      0    1    2
# 0  0.0  1.0  2.0
# 1  0.1  0.3  NaN

source: pandas_series_to_dataframe.py

DataFrameをSeriesに変換

[]によるインデックス指定やloc[], iloc[]でDataFrameの行・列をSeriesとして取得できる。[]やloc[], iloc[]についての詳細は以下の記事を参照。

関連記事: pandasのインデックス指定で行・列を抽出
関連記事: pandasで任意の位置の値を取得・変更するat, iat, loc, iloc

DataFrameの列をSeriesとして取得

[]またはloc[]で列名、iloc[]で列番号をスカラー値で指定すると、その列をSeriesとして取得できる。

df = pd.DataFrame({'col0': [0, 1, 2], 'col1': [3, 4, 5], 'col2': [6, 7, 8]},
                  index=['row0', 'row1', 'row2'])
print(df)
#       col0  col1  col2
# row0     0     3     6
# row1     1     4     7
# row2     2     5     8

print(df['col0'])
# row0    0
# row1    1
# row2    2
# Name: col0, dtype: int64

print(df.loc[:, 'col0'])
# row0    0
# row1    1
# row2    2
# Name: col0, dtype: int64

print(df.iloc[:, 0])
# row0    0
# row1    1
# row2    2
# Name: col0, dtype: int64

source: pandas_dataframe_to_series.py

loc[]やiloc[]では、リストやスライスで任意の行の要素のみを選択することも可能。

print(df.iloc[[0, 2], 0])
# row0    0
# row2    2
# Name: col0, dtype: int64

print(df.iloc[:2, 0])
# row0    0
# row1    1
# Name: col0, dtype: int64

source: pandas_dataframe_to_series.py

リストやスライスで一列を選択するとSeriesではなく一列のDataFrameとなるので注意。

print(df.loc[:, ['col0']])
#       col0
# row0     0
# row1     1
# row2     2

print(df.iloc[:, :1])
#       col0
# row0     0
# row1     1
# row2     2

source: pandas_dataframe_to_series.py

DataFrameの行をSeriesとして取得

loc[]で行名、iloc[]で行番号をスカラー値で指定すると、その行をSeriesとして取得できる。

df = pd.DataFrame({'col0': [0, 1, 2], 'col1': [3, 4, 5], 'col2': [6, 7, 8]},
                  index=['row0', 'row1', 'row2'])
print(df)
#       col0  col1  col2
# row0     0     3     6
# row1     1     4     7
# row2     2     5     8

print(df.loc['row0', :])
# col0    0
# col1    3
# col2    6
# Name: row0, dtype: int64

print(df.iloc[0, :])
# col0    0
# col1    3
# col2    6
# Name: row0, dtype: int64

source: pandas_dataframe_to_series.py

全体を指定する場合、列指定の:は省略可能。

print(df.loc['row0'])
# col0    0
# col1    3
# col2    6
# Name: row0, dtype: int64

print(df.iloc[0])
# col0    0
# col1    3
# col2    6
# Name: row0, dtype: int64

source: pandas_dataframe_to_series.py

リストやスライスで任意の列の要素のみを選択することもできる。

print(df.iloc[0, [0, 2]])
# col0    0
# col2    6
# Name: row0, dtype: int64

print(df.iloc[0, :2])
# col0    0
# col1    3
# Name: row0, dtype: int64

source: pandas_dataframe_to_series.py

リストやスライスで一行を選択するとSeriesではなく一行のDataFrameとなるので注意。

print(df.loc[['row0']])
#       col0  col1  col2
# row0     0     3     6

print(df.iloc[:1])
#       col0  col1  col2
# row0     0     3     6

source: pandas_dataframe_to_series.py

データ型に注意

DataFrameは列ごとにデータ型dtypeを保持するのに対し、Seriesは全体で一つのデータ型となる。

関連記事: pandasのデータ型dtype一覧とastypeによる変換（キャスト）

DataFrameの行をSeriesとして取得する場合は要注意。

整数intの列と浮動小数点数floatの列からなるDataFrameの行をSeriesとして取得すると、データ型はfloatになる。intの列の要素がfloatに型変換される。

df_multi = pd.DataFrame({'col0': [0, 1, 2], 'col1': [0.0, 0.1, 0.2]},
                        index=['row0', 'row1', 'row2'])
print(df_multi)
#       col0  col1
# row0     0   0.0
# row1     1   0.1
# row2     2   0.2

s_row = df_multi.loc['row2']
print(s_row)
# col0    2.0
# col1    0.2
# Name: row2, dtype: float64

source: pandas_dataframe_to_series.py

DataFrameに文字列などobjectの列が含まれている場合、行をSeriesとして取得するとデータ型はobjectになる。

df_multi['col2'] = ['a', 'b', 'c']
print(df_multi)
#       col0  col1 col2
# row0     0   0.0    a
# row1     1   0.1    b
# row2     2   0.2    c

print(df_multi.dtypes)
# col0      int64
# col1    float64
# col2     object
# dtype: object

s_row = df_multi.loc['row2']
print(s_row)
# col0      2
# col1    0.2
# col2      c
# Name: row2, dtype: object

source: pandas_dataframe_to_series.py

objectでは要素が様々な型を持つので各要素は元の型のまま。

print(type(s_row['col0']))
# <class 'numpy.int64'>

print(type(s_row['col1']))
# <class 'numpy.float64'>

print(type(s_row['col2']))
# <class 'str'>

source: pandas_dataframe_to_series.py

ビューとコピー（メモリの共有）

DataFrameとSeriesを相互に変換する場合、生成されたオブジェクトは元のオブジェクトのビューまたはコピーになる。ビューは元のオブジェクトとメモリを共有し、一方を変更すると他方も変更される。

関連記事: pandas.DataFrameにおけるビューとコピー

SeriesをDataFrameに変換する場合

to_frame()

to_frame()メソッドは可能な限りビューを返す。copy()でコピーを生成可能。

s = pd.Series([0, 1], index=['A', 'B'])
df = s.to_frame()

s['A'] = 100
print(df)
#      0
# A  100
# B    1

s = pd.Series([0, 1], index=['A', 'B'])
df_copy = s.copy().to_frame()

s['A'] = 100
print(df_copy)
#    0
# A  0
# B  1

source: pandas_series_to_dataframe.py

pd.DataFrame()

コンストラクタpd.DataFrame()はデフォルトでは可能な限りビューを返す。引数copyをTrueとするとコピーを返す。

s = pd.Series([0, 1], index=['A', 'B'])
df = pd.DataFrame(s)

s['A'] = 100
print(df)
#      0
# A  100
# B    1

s = pd.Series([0, 1], index=['A', 'B'])
df_copy = pd.DataFrame(s, copy=True)

s['A'] = 100
print(df_copy)
#    0
# A  0
# B  1

source: pandas_series_to_dataframe.py

pd.concat()

pd.concat()はデフォルトではコピーを返す。引数copyをFalseとすると可能な限りビューを返す。

s1 = pd.Series([0, 1], index=['A', 'B'])
s2 = pd.Series([0.0, 0.1], index=['A', 'B'])
df = pd.concat([s1, s2], axis=1)

s1['A'] = 100
print(df)
#    0    1
# A  0  0.0
# B  1  0.1

s1 = pd.Series([0, 1], index=['A', 'B'])
s2 = pd.Series([0.0, 0.1], index=['A', 'B'])
df_copy_false = pd.concat([s1, s2], axis=1, copy=False)

s1['A'] = 100
print(df_copy_false)
#      0    1
# A  100  0.0
# B    1  0.1

source: pandas_series_to_dataframe.py

なお、pd.DataFrame()やpd.concat()などの引数copyが提供されている関数やメソッドでは、copy=Trueとすると必ずコピーが生成されるが、copy=Falseでは可能な限りビューを生成するという処理になる。

copy=Falseであっても、メモリレイアウトによってはビューが生成できずにコピーが生成される。必ずビューが生成されるとは限らないので注意。

DataFrameをSeriesに変換する場合

DataFrameの行や列をSeriesとして取得すると、基本的にはSeriesは元のDataFrameのビューとなる。

df = pd.DataFrame({'col0': [0, 1, 2], 'col1': [3, 4, 5], 'col2': [6, 7, 8]},
                  index=['row0', 'row1', 'row2'])
print(df)
#       col0  col1  col2
# row0     0     3     6
# row1     1     4     7
# row2     2     5     8

s = df['col0']
s['row0'] = 10
print(s)
# row0    10
# row1     1
# row2     2
# Name: col0, dtype: int64

print(df)
#       col0  col1  col2
# row0    10     3     6
# row1     1     4     7
# row2     2     5     8

source: pandas_dataframe_to_series.py

別々に扱いたい場合はcopy()でコピーを生成する。

s_copy = df['col1'].copy()
s_copy['row0'] = 100
print(s_copy)
# row0    100
# row1      4
# row2      5
# Name: col1, dtype: int64

print(df)
#       col0  col1  col2
# row0    10     3     6
# row1     1     4     7
# row2     2     5     8

source: pandas_dataframe_to_series.py

リストを使って特定の要素のみを取り出す場合はビューではなくコピーが生成される。

s_list = df.loc[['row0', 'row2'], 'col2']
s_list['row0'] = 1000
print(s_list)
# row0    1000
# row2       8
# Name: col2, dtype: int64

print(df)
#       col0  col1  col2
# row0    10     3     6
# row1     1     4     7
# row2     2     5     8

source: pandas_dataframe_to_series.py

loc[]やiloc[]などでDataFrameの一部を選択して新たなDataFrameを生成する場合は、指定方法によってビューが生成されるかコピーが生成されるかが異なる。

関連記事: pandas.DataFrameにおけるビューとコピー

関連カテゴリー

関連記事