GROUP BY in Python: itertools.groupby

Modified: 2023-05-17 | Tags: Python, List

In Python, you can group consecutive elements of the same value in an iterable object, such as a list, with itertools.groupby().

itertools.groupby() — Functions creating iterators for efficient looping — Python 3.11.3 documentation

import itertools

l = [0, 0, 0, 1, 1, 2, 0, 0]
print([(k, list(g)) for k, g in itertools.groupby(l)])
# [(0, [0, 0, 0]), (1, [1, 1]), (2, [2]), (0, [0, 0])]

source: itertools_groupby.py

Contents

How to use itertools.groupby()
Specify a function computing a key value for each element: key
Aggregate like GROUP BY in SQL
For tuples and strings

To count the number of elements of the same value, regardless of their order (be it consecutive or non-consecutive), you can use collections.Counter.

Count elements in a list with collections.Counter in Python

How to use `itertools.groupby()`

itertools.groupby() returns an iterator of keys and groups. Note that these values are not displayed when using print().

l = [0, 0, 0, 1, 1, 2, 0, 0]
print(itertools.groupby(l))
# <itertools.groupby object at 0x110ab58b0>

source: itertools_groupby.py

The returned group is also an iterator. You can convert this into a list using list(), as shown below:

The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list: itertools.groupby() — Functions creating iterators for efficient looping — Python 3.11.3 documentation

for k, g in itertools.groupby(l):
    print(k, g)
# 0 <itertools._grouper object at 0x110a26940>
# 1 <itertools._grouper object at 0x110a2c400>
# 2 <itertools._grouper object at 0x110aa8f10>
# 0 <itertools._grouper object at 0x110aa8ee0>

for k, g in itertools.groupby(l):
    print(k, list(g))
# 0 [0, 0, 0]
# 1 [1, 1]
# 2 [2]
# 0 [0, 0]

source: itertools_groupby.py

You can use the list comprehensions to get a list of keys only, groups only, or both (tuples of key and group).

List comprehensions in Python

print([k for k, g in itertools.groupby(l)])
# [0, 1, 2, 0]

print([list(g) for k, g in itertools.groupby(l)])
# [[0, 0, 0], [1, 1], [2], [0, 0]]

print([(k, list(g)) for k, g in itertools.groupby(l)])
# [(0, [0, 0, 0]), (1, [1, 1]), (2, [2]), (0, [0, 0])]

source: itertools_groupby.py

Specify a function computing a key value for each element: `key`

You can specify the key parameter for itertools.groupby(). The key parameter is used in the same way as in other functions such as sorted(), max(), min(), and others.

How to use the key argument in Python (sorted, max, etc.)

The function (callable object) specified in key determines whether the values of consecutive elements are the same. For example, by specifying the built-in len() function, which returns the length of a string, you can group elements of the same length.

l = ['aaa', 'bbb', 'ccc', 'a', 'b', 'aa', 'bb']
print([(k, list(g)) for k, g in itertools.groupby(l, len)])
# [(3, ['aaa', 'bbb', 'ccc']), (1, ['a', 'b']), (2, ['aa', 'bb'])]

source: itertools_groupby.py

In the following example, a lambda expression is used to group by even or odd numbers.

Lambda expressions in Python

l = [0, 2, 0, 3, 1, 4, 4, 0]
print([(k, list(g)) for k, g in itertools.groupby(l, lambda x: x % 2)])
# [(0, [0, 2, 0]), (1, [3, 1]), (0, [4, 4, 0])]

source: itertools_groupby.py

Aggregate like `GROUP BY` in SQL

For two-dimensional data, such as a list of lists, you can use key to group data based on a given column, similar to GROUP BY in SQL.

In the following example, a lambda expression is used to fetch the element at a desired position in the list. operator.itemgetter() can also be used for this purpose.

The operator module in Python (itemgetter, attrgetter, methodcaller)

While a for loop is used here for readability, you can also use list comprehensions, as shown in previous examples.

l = [[0, 'Alice', 0],
     [1, 'Alice', 10],
     [2, 'Bob', 20],
     [3, 'Bob', 30],
     [4, 'Alice', 40]]

for k, g in itertools.groupby(l, lambda x: x[1]):
    print(k, list(g))
# Alice [[0, 'Alice', 0], [1, 'Alice', 10]]
# Bob [[2, 'Bob', 20], [3, 'Bob', 30]]
# Alice [[4, 'Alice', 40]]

source: itertools_groupby.py

itertools.groupby() groups only consecutive elements of the same value. To group elements regardless of their order, use sorted() to sort the original list.

When sorting a list of lists, the list is sorted by the first element of each list by default. To sort by the element at a given position, specify the key parameter of sorted().

for k, g in itertools.groupby(sorted(l, key=lambda x: x[1]), lambda x: x[1]):
    print(k, list(g))
# Alice [[0, 'Alice', 0], [1, 'Alice', 10], [4, 'Alice', 40]]
# Bob [[2, 'Bob', 20], [3, 'Bob', 30]]

source: itertools_groupby.py

You can sum numbers with a generator expression:

List comprehensions in Python

for k, g in itertools.groupby(sorted(l, key=lambda x: x[1]), lambda x: x[1]):
    print(k, sum(x[2] for x in g))
# Alice 50
# Bob 50

source: itertools_groupby.py

Note that the pandas library also offers groupby() for grouping and aggregation, which can be more convenient for handling complex data.

pandas: Grouping data with groupby()

For tuples and strings

You can use itertools.groupby() to handle not only lists but also other iterable objects like tuples and strings.

For tuples:

t = (0, 0, 0, 1, 1, 2, 0, 0)
print([(k, list(g)) for k, g in itertools.groupby(t)])
# [(0, [0, 0, 0]), (1, [1, 1]), (2, [2]), (0, [0, 0])]

source: itertools_groupby.py

To convert a group into a tuple instead of a list, use tuple().

print(tuple((k, tuple(g)) for k, g in itertools.groupby(t)))
# ((0, (0, 0, 0)), (1, (1, 1)), (2, (2,)), (0, (0, 0)))

source: itertools_groupby.py

For strings:

s = 'aaabbcaa'
print([(k, list(g)) for k, g in itertools.groupby(s)])
# [('a', ['a', 'a', 'a']), ('b', ['b', 'b']), ('c', ['c']), ('a', ['a', 'a'])]

source: itertools_groupby.py

To convert a group into a string, use join().

Concatenate strings in Python (+ operator, join, etc.)

print([(k, ''.join(g)) for k, g in itertools.groupby(s)])
# [('a', 'aaa'), ('b', 'bb'), ('c', 'c'), ('a', 'aa')]

source: itertools_groupby.py

Of course, you can also handle any other iterable object with itertools.groupby().

GROUP BY in Python: itertools.groupby

How to use `itertools.groupby()`

Specify a function computing a key value for each element: `key`

Aggregate like `GROUP BY` in SQL

For tuples and strings

Related Categories

Related Articles

GROUP BY in Python: itertools.groupby

How to use itertools.groupby()

Specify a function computing a key value for each element: key

Aggregate like GROUP BY in SQL

For tuples and strings

Related Categories

Related Articles

How to use `itertools.groupby()`

Specify a function computing a key value for each element: `key`

Aggregate like `GROUP BY` in SQL