pandas.DataFrame, Seriesの先頭・末尾の行を返すheadとtail

Modified: 2023-12-10 | Tags: Python, pandas

pandasでDataFrameやSeriesから先頭（最初）と末尾（最後）の行・要素を取得するにはhead()とtail()メソッドを使う。

先頭（最初）の行を返すhead()
末尾（最後）の行を返すtail()
スライスで行番号を指定して取得
先頭行・最終行の要素の値を取得

サイズが大きいDataFrameやSeriesのデータを確認するときに使えるほかのメソッドとして、行・列をランダムサンプリングするsample()もある。

関連記事: pandasの行・列をランダムサンプリング（抽出）するsample

本記事のサンプルコードのpandasのバージョンは以下の通り。バージョンによって仕様が異なる可能性があるので注意。例として10行のDataFrameを使う。

import pandas as pd

print(pd.__version__)
# 2.1.4

df = pd.DataFrame({'col_0': list('ABCDEFGHIJ'), 'col_1': range(9, -1, -1)},
                  index=[f'row_{i}' for i in range(10)])
print(df)
#       col_0  col_1
# row_0     A      9
# row_1     B      8
# row_2     C      7
# row_3     D      6
# row_4     E      5
# row_5     F      4
# row_6     G      3
# row_7     H      2
# row_8     I      1
# row_9     J      0

source: pandas_head_tail.py

例はDataFrameだが、Seriesでもhead()とtail()が用意されている。引数など、使い方は同じ。

先頭（最初）の行を返すhead()

head()メソッドは先頭（最初）の行を返す。

デフォルトは先頭5行分。

print(df.head())
#       col_0  col_1
# row_0     A      9
# row_1     B      8
# row_2     C      7
# row_3     D      6
# row_4     E      5

source: pandas_head_tail.py

第一引数nを指定すると先頭n行が返される。

print(df.head(3))
#       col_0  col_1
# row_0     A      9
# row_1     B      8
# row_2     C      7

source: pandas_head_tail.py

末尾（最後）の行を返すtail()

tail()メソッドは末尾（最後）の行を返す。

デフォルトは末尾5行分。

print(df.tail())
#       col_0  col_1
# row_5     F      4
# row_6     G      3
# row_7     H      2
# row_8     I      1
# row_9     J      0

source: pandas_head_tail.py

第一引数nを指定すると末尾n行が返される。

print(df.tail(3))
#       col_0  col_1
# row_7     H      2
# row_8     I      1
# row_9     J      0

source: pandas_head_tail.py

スライスで行番号を指定して取得

スライスで行番号を指定して任意の位置の行を取得することもできる。

関連記事: pandasのインデックス指定で行・列を抽出

print(df[3:6])
#       col_0  col_1
# row_3     D      6
# row_4     E      5
# row_5     F      4

source: pandas_head_tail.py

スライスを使ってhead()とtail()と同様の処理を行うことも可能。

print(df[:5])
#       col_0  col_1
# row_0     A      9
# row_1     B      8
# row_2     C      7
# row_3     D      6
# row_4     E      5

print(df[-5:])
#       col_0  col_1
# row_5     F      4
# row_6     G      3
# row_7     H      2
# row_8     I      1
# row_9     J      0

source: pandas_head_tail.py

先頭行・最終行の要素の値を取得

head()やtail()の引数に1を指定すると先頭行または最終行を取得できるが、一行だけでも型はDataFrame。

print(df.head(1))
#       col_0  col_1
# row_0     A      9

print(type(df.head(1)))
# <class 'pandas.core.frame.DataFrame'>

source: pandas_head_tail.py

ilocを使って一行だけを指定すると、指定行をSeriesで取得できる。iloc[0]で先頭行、iloc[-1]で最終行を取得できる。

要素の値を取得したい場合はiloc[0]['列名'], iloc[-1]['列名']のようにすればよい。

print(df.iloc[0])
# col_0    A
# col_1    9
# Name: row_0, dtype: object

print(type(df.iloc[0]))
# <class 'pandas.core.series.Series'>

print(df.iloc[0]['col_0'])
# A

source: pandas_head_tail.py

print(df.iloc[-1])
# col_0    J
# col_1    0
# Name: row_9, dtype: object

print(type(df.iloc[-1]))
# <class 'pandas.core.series.Series'>

print(df.iloc[-1]['col_0'])
# J

source: pandas_head_tail.py

値を代入する場合は上の書き方だとSettingWithCopyWarningという警告が出るので注意。

関連記事: pandasのSettingWithCopyWarningの対処法

df.iloc[0]['col_0'] = 'AAA'
# /var/folders/rf/b7l8_vgj5mdgvghn_326rn_c0000gn/T/ipykernel_48384/183824280.py:1: SettingWithCopyWarning: 
# A value is trying to be set on a copy of a slice from a DataFrame

source: pandas_head_tail.py

SettingWithCopyWarningを防ぐには、index属性から列名を取得してatで指定する方法がある。locでもよいが、要素の取得・代入にはatのほうが高速。

df.at[df.index[0], 'col_0'] = 'AAA'
df.at[df.index[-1], 'col_0'] = 'JJJ'

print(df)
#       col_0  col_1
# row_0   AAA      9
# row_1     B      8
# row_2     C      7
# row_3     D      6
# row_4     E      5
# row_5     F      4
# row_6     G      3
# row_7     H      2
# row_8     I      1
# row_9   JJJ      0

source: pandas_head_tail.py

at, iat, loc, ilocについての詳細は以下の記事を参照。

関連記事: pandasで任意の位置の値を取得・変更するat, iat, loc, iloc

pandas.DataFrame, Seriesの先頭・末尾の行を返すheadとtail

先頭（最初）の行を返すhead()

末尾（最後）の行を返すtail()

スライスで行番号を指定して取得

先頭行・最終行の要素の値を取得

関連カテゴリー

関連記事