Pythonでリストからランダムに要素を選択するchoice, sample, choices

Modified: 2023-05-11 | Tags: Python, リスト

Python標準ライブラリのrandomモジュールのchoice(), sample(), choices()関数を使うと、リストやタプル、文字列などのシーケンスオブジェクトからランダムに要素を選択して取得（ランダムサンプリング）できる。

choice()は要素を一つ選択、sample(), choices()は複数の要素を選択する。sample()は重複なしの非復元抽出、choices()は重複ありの復元抽出。

ランダムに要素を一つ選択: random.choice()
ランダムに複数の要素を選択（重複なし）: random.sample()
ランダムに複数の要素を選択（重複あり）: random.choices()
乱数シードを固定

ランダムではなく任意の条件で要素を抽出したい場合は以下の記事を参照。

関連記事: Pythonのリスト（配列）の特定の要素を抽出、置換、変換

リストの要素をランダムに並べ替えたい場合や、乱数やそのリスト自体を生成したい場合は以下の記事を参照。

関連記事: Pythonでリストの要素をシャッフル（random.shuffle, sample）
関連記事: Pythonでランダムな小数・整数を生成するrandom, randrange, randintなど

ランダムに要素を一つ選択: random.choice()

random.choice()はリストからランダムに要素を一つ選択して返す。

random.choice() --- 擬似乱数を生成する — Python 3.11.3 ドキュメント

import random

l = [0, 1, 2, 3, 4]

print(random.choice(l))
# 1

source: random_choice.py

タプルや文字列でも同様。文字列の場合は一文字が選択される。

print(random.choice(('xxx', 'yyy', 'zzz')))
# yyy

print(random.choice('abcde'))
# b

source: random_choice.py

空のリストやタプル、文字列を引数として指定するとエラー。

# print(random.choice([]))
# IndexError: Cannot choose from an empty sequence

source: random_choice.py

ランダムに複数の要素を選択（重複なし）: random.sample()

random.sample()はリストからランダムに複数の要素を選択してリストとして返す。要素の重複はなし（非復元抽出）。

第一引数にリスト、第二引数に取得したい要素の個数を指定する

random.sample() --- 擬似乱数を生成する — Python 3.11.3 ドキュメント

import random

l = [0, 1, 2, 3, 4]

print(random.sample(l, 3))
# [3, 1, 0]

print(type(random.sample(l, 3)))
# <class 'list'>

source: random_sample.py

第二引数を1とした場合も要素が一つのリストが返される。0とした場合は空のリスト。第一引数に指定したリストの要素数を超える値だとエラーとなる。

print(random.sample(l, 1))
# [1]

print(random.sample(l, 0))
# []

# print(random.sample(l, 10))
# ValueError: Sample larger than population or is negative

source: random_sample.py

第一引数をタプルや文字列にした場合も返されるのはリスト。

print(random.sample(('xxx', 'yyy', 'zzz'), 2))
# ['zzz', 'xxx']

print(random.sample('abcde', 2))
# ['c', 'd']

source: random_sample.py

タプルや文字列に戻したい場合はtuple(), join()を使う。

関連記事: Pythonでリストとタプルを相互に変換するlist(), tuple()
関連記事: Pythonで文字列を連結・結合（+演算子、joinなど）

print(tuple(random.sample(('xxx', 'yyy', 'zzz'), 2)))
# ('zzz', 'yyy')

print(''.join(random.sample('abcde', 2)))
# be

source: random_sample.py

値を判定しているわけではないので、元のリストやタプルなどに同じ値の要素が含まれていると同じ値が選択される可能性がある。

l_dup = [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3]

print(random.sample(l_dup, 3))
# [2, 0, 0]

source: random_sample.py

重複した値を避けたい場合はset()で集合（set型）に変換してユニーク（一意）な要素のみを抽出してからsample()を使えばよい。

関連記事: Pythonでリスト（配列）から重複した要素を削除・抽出

print(set(l_dup))
# {0, 1, 2, 3}

print(random.sample(list(set(l_dup)), 3))
# [0, 2, 1]

source: random_sample.py

Python 3.11からsample()の第一引数にsetを直接指定するとエラーになるようになった。上の例（list(set(...))）のように明示的にリストなどに変換する必要があるので注意。

ランダムに複数の要素を選択（重複あり）: random.choices()

random.choices()はリストからランダムに複数の要素を選択してリストとして返す。sample()とは異なり、要素の重複を許して選択される（復元抽出）。

random.choices() --- 擬似乱数を生成する — Python 3.11.3 ドキュメント

引数kで取得したい要素の個数を指定する。重複が認められているので、取得する要素数kを元のリストの要素数より大きくすることもできる。

kはキーワード専用引数なのでk=3などのようにキーワードを指定する必要がある。

import random

l = [0, 1, 2, 3, 4]

print(random.choices(l, k=3))
# [2, 1, 0]

print(random.choices(l, k=10))
# [3, 4, 1, 4, 4, 2, 0, 4, 2, 0]

source: random_choices.py

デフォルトはk=1。省略した場合は要素数1のリストが返される。

print(random.choices(l))
# [1]

source: random_choices.py

引数weightsでそれぞれの要素が選ばれる重み（確率）を指定できる。weightsに指定するリストの要素の型はintでもfloatでもOK。0にするとその要素は選ばれない。

print(random.choices(l, k=3, weights=[1, 1, 1, 10, 1]))
# [0, 2, 3]

print(random.choices(l, k=3, weights=[1, 1, 0, 0, 0]))
# [0, 1, 1]

source: random_choices.py

引数cum_weightsに累積的な重みとして指定することもできる。以下のサンプルコードのcum_weightsは上の一つ目のweightsと等価。

print(random.choices(l, k=3, cum_weights=[1, 2, 3, 13, 14]))
# [3, 2, 3]

source: random_choices.py

引数weights, cum_weightsのデフォルトはどちらもNoneで、それぞれの要素が同じ確率で選択される。

引数weightsまたはcum_weightsの長さ（要素数）が元のリストと異なるとエラーが発生する。

# print(random.choices(l, k=3, weights=[1, 1, 1, 10, 1, 1, 1]))
# ValueError: The number of weights does not match the population_

source: random_choices.py

また、weightsとcum_weightsを同時に指定してもエラーとなる。

# print(random.choices(l, k=3, weights=[1, 1, 1, 10, 1], cum_weights=[1, 2, 3, 13, 14]))
# TypeError: Cannot specify both weights and cumulative weights

source: random_choices.py

ここまでサンプルコードで例として第一引数にリストを指定していたが、タプルや文字列でも同様。

乱数シードを固定

random.seed()に任意の整数を与えることで、乱数シードを固定し乱数生成器を初期化できる。

random.seed() --- 擬似乱数を生成する — Python 3.11.3 ドキュメント

同じシードで初期化した後は常に同じように要素が選択される。

random.seed(0)
print(random.choice(l))
# 3

random.seed(0)
print(random.choice(l))
# 3

source: random_choice.py

Pythonでリストからランダムに要素を選択するchoice, sample, choices

ランダムに要素を一つ選択: random.choice()

ランダムに複数の要素を選択（重複なし）: random.sample()

ランダムに複数の要素を選択（重複あり）: random.choices()

乱数シードを固定

関連カテゴリー

関連記事