pandasで最大値・最小値の行名・列名を取得するidxmax, idxmin

Posted: 2018-07-09 | Tags: Python, pandas

pandas.DataFrame, pandas.Seriesで各列・各行の最大値・最小値である要素の行名・列名を取得するにはidxmax(), idxmin()メソッドを使う。

pandas.DataFrame, pandas.Seriesそれぞれにidxmax(), idxmin()メソッドがある。

pandas.Seriesにはargmax(), argmin()メソッドもあるがバージョン0.21.0からDeprecated（非推奨）となっている。

ここでは以下の内容について説明する。

最大値・最小値を取得: max(), min()
最大値・最小値の行名・列名を取得: idxmax(), idxmin()
欠損値NaNの扱い

なお、行番号・列番号は行名・列名から取得することができる。以下の記事を参照。

関連記事: pandas.DataFrameの行番号、列番号を取得

以下のpandas.DataFrameを例とする。1列を選択したものをpandas.Seriesの例として使う。

import pandas as pd

df = pd.DataFrame({'col1': [0, 3, 2, 3], 'col2': [4, 0, 2, 1]},
                   index=['a', 'b', 'c', 'd'])

print(df)
#    col1  col2
# a     0     4
# b     3     0
# c     2     2
# d     3     1

print(df['col1'])
# a    0
# b    3
# c    2
# d    3
# Name: col1, dtype: int64

print(type(df['col1']))
# <class 'pandas.core.series.Series'>

source: pandas_idxmax_idxmin.py

最大値・最小値を取得: max(), min()

最大値・最小値の値そのものを取得する場合はmax(), min()メソッドを使う。

pandas.Series

pandas.Seriesに対する例。

print(df['col1'])
# a    0
# b    3
# c    2
# d    3
# Name: col1, dtype: int64

source: pandas_idxmax_idxmin.py

max(), min()メソッドで最大値・最小値が取得できる。

print(df['col1'].max())
# 3

print(df['col1'].min())
# 0

source: pandas_idxmax_idxmin.py

pandas.DataFrame

pandas.DataFrameに対する例。

print(df)
#    col1  col2
# a     0     4
# b     3     0
# c     2     2
# d     3     1

source: pandas_idxmax_idxmin.py

デフォルトでは各列の最大値・最小値が取得できる。

print(df.max())
# col1    3
# col2    4
# dtype: int64

print(df.min())
# col1    0
# col2    0
# dtype: int64

source: pandas_idxmax_idxmin.py

引数axis=1とすると各行の最大値・最小値が取得できる。

print(df.max(axis=1))
# a    4
# b    3
# c    2
# d    3
# dtype: int64

print(df.min(axis=1))
# a    0
# b    0
# c    2
# d    1
# dtype: int64

source: pandas_idxmax_idxmin.py

いずれの場合も返り値の型はpandas.Series。

print(type(df.max()))
# <class 'pandas.core.series.Series'>

source: pandas_idxmax_idxmin.py

最大値・最小値の行名・列名を取得: idxmax(), idxmin()

pandas.Series

pandas.Seriesに対する例。

print(df['col1'])
# a    0
# b    3
# c    2
# d    3
# Name: col1, dtype: int64

source: pandas_idxmax_idxmin.py

pandas.Seriesのidxmax(), idxmin()では最大値・最小値の要素のインデックス（ラベル）が取得できる。最大値・最小値が複数ある場合は最初の要素のインデックスのみが返される。

print(df['col1'].idxmax())
# b

print(df['col1'].idxmin())
# a

source: pandas_idxmax_idxmin.py

最大値・最小値が複数ある場合にすべてのインデックスを取得したい場合は、最大値・最小値に等しい要素をブールインデックス参照で抽出し、index属性を取得する。

print(df['col1'] == df['col1'].max())
# a    False
# b     True
# c    False
# d     True
# Name: col1, dtype: bool

print(df['col1'][df['col1'] == df['col1'].max()])
# b    3
# d    3
# Name: col1, dtype: int64

print(df['col1'][df['col1'] == df['col1'].max()].index)
# Index(['b', 'd'], dtype='object')

source: pandas_idxmax_idxmin.py

indexのvalues属性でNumPy配列numpy.ndarray、list()でPython標準のlist型を取得できる。

print(df['col1'][df['col1'] == df['col1'].max()].index.values)
# ['b' 'd']

print(type(df['col1'][df['col1'] == df['col1'].max()].index.values))
# <class 'numpy.ndarray'>

print(list(df['col1'][df['col1'] == df['col1'].max()].index))
# ['b', 'd']

print(type(list(df['col1'][df['col1'] == df['col1'].max()].index)))
# <class 'list'>

source: pandas_idxmax_idxmin.py

同様の処理を最大値・最小値が1個だけのpandas.Seriesに対して行うと要素が1個のnumpy.ndarrayやlistとなる。

print(df['col1'][df['col1'] == df['col1'].min()].index.values)
# ['a']

source: pandas_idxmax_idxmin.py

pandas.DataFrameの行を抽出する場合はlocまたはilocを使う。

関連記事: pandasで任意の位置の値を取得・変更するat, iat, loc, iloc

print(df.loc['a'])
# col1    0
# col2    4
# Name: a, dtype: int64

print(df.loc['a'].idxmax())
# col2

print(df.loc['a'].idxmin())
# col1

source: pandas_idxmax_idxmin.py

pandas.DataFrame

pandas.DataFrameに対する例。

print(df)
#    col1  col2
# a     0     4
# b     3     0
# c     2     2
# d     3     1

source: pandas_idxmax_idxmin.py

デフォルトでは、各列の最大値・最小値の要素の行名がpandas.Seriesとして取得できる。ここでも、最大値・最小値が複数ある場合は最初の要素のインデックスのみが返される。

print(df.idxmax())
# col1    b
# col2    a
# dtype: object

print(df.idxmin())
# col1    a
# col2    b
# dtype: object

source: pandas_idxmax_idxmin.py

各列に関数を適用するapply()メソッドで上述のpandas.Seriesと同様の処理を行えば、各列の最大値・最小値の要素の行名をnumpy.ndarrayやlistとして取得可能。無名関数（ラムダ式）を使っている。

関連記事: pandasで要素、行、列に関数を適用するmap, applymap, apply
関連記事: Pythonのlambda（ラムダ式、無名関数）の使い方

print(df.apply(lambda x: list(x[x == x.max()].index)))
# col1    [b, d]
# col2       [a]
# dtype: object

print(df.apply(lambda x: list(x[x == x.min()].index)))
# col1    [a]
# col2    [b]
# dtype: object

source: pandas_idxmax_idxmin.py

pandas.Seriesの要素としてnumpy.ndarrayやlistが格納されている場合の処理の方法などは以下の記事を参照。len()で個数を取得したりできる。

関連記事: pandasの要素としてリストを格納し処理

idxmax(), idxmin()メソッドの引数axis=1とすると、各行の最大値・最小値の要素の列名がpandas.Seriesとして取得できる。

print(df.idxmax(axis=1))
# a    col2
# b    col1
# c    col1
# d    col1
# dtype: object

print(df.idxmin(axis=1))
# a    col1
# b    col2
# c    col1
# d    col2
# dtype: object

source: pandas_idxmax_idxmin.py

apply()メソッドでも引数axis=1とすると行に対する処理となる。

print(df.apply(lambda x: list(x[x == x.max()].index), axis=1))
# a          [col2]
# b          [col1]
# c    [col1, col2]
# d          [col1]
# dtype: object

print(df.apply(lambda x: list(x[x == x.min()].index), axis=1))
# a          [col1]
# b          [col2]
# c    [col1, col2]
# d          [col2]
# dtype: object

source: pandas_idxmax_idxmin.py

欠損値NaNの扱い

例として欠損値NaNを含むpandas.DataFrameを作成する。

df_nan = df.copy()
df_nan.at['b'] = pd.np.nan

print(df_nan)
#    col1  col2
# a   0.0   4.0
# b   NaN   NaN
# c   2.0   2.0
# d   3.0   1.0

source: pandas_idxmax_idxmin.py

idxmax(), idxmin()ではデフォルトで欠損値NaNが除外して処理される。ただし、すべての要素がNaNの列や行の結果はNaNとなる。

print(df_nan.idxmax())
# col1    d
# col2    a
# dtype: object

print(df_nan.idxmin())
# col1    a
# col2    d
# dtype: object

print(df_nan.idxmax(axis=1))
# a    col2
# b     NaN
# c    col1
# d    col1
# dtype: object

print(df_nan.idxmin(axis=1))
# a    col1
# b     NaN
# c    col1
# d    col2
# dtype: object

source: pandas_idxmax_idxmin.py

引数skipna=FalseとするとNaNは除外されない。この場合、NaNが含まれている行・列の結果はidxmax()でもidxmin()でもNaNとなる。

print(df_nan.idxmax(skipna=False))
# col1   NaN
# col2   NaN
# dtype: float64

print(df_nan.idxmin(skipna=False))
# col1   NaN
# col2   NaN
# dtype: float64

print(df_nan.idxmax(axis=1, skipna=False))
# a    col2
# b     NaN
# c    col1
# d    col1
# dtype: object

print(df_nan.idxmin(axis=1, skipna=False))
# a    col1
# b     NaN
# c    col1
# d    col2
# dtype: object

source: pandas_idxmax_idxmin.py

pandas.Seriesでも同様。

print(df_nan['col1'].idxmax())
# d

print(df_nan['col1'].idxmin())
# a

print(df_nan['col1'].idxmax(skipna=False))
# nan

print(df_nan['col1'].idxmin(skipna=False))
# nan

source: pandas_idxmax_idxmin.py

pandasで最大値・最小値の行名・列名を取得するidxmax, idxmin

最大値・最小値を取得: max(), min()

pandas.Series

pandas.DataFrame

最大値・最小値の行名・列名を取得: idxmax(), idxmin()

pandas.Series

pandas.DataFrame

欠損値NaNの扱い

関連カテゴリー

関連記事