GROUP BY in Python: itertools.groupby

Modified: | Tags: Python, List

In Python, you can group consecutive elements of the same value in an iterable object, such as a list, with itertools.groupby().

import itertools

l = [0, 0, 0, 1, 1, 2, 0, 0]
print([(k, list(g)) for k, g in itertools.groupby(l)])
# [(0, [0, 0, 0]), (1, [1, 1]), (2, [2]), (0, [0, 0])]

To count the number of elements of the same value, regardless of their order (be it consecutive or non-consecutive), you can use collections.Counter.

How to use itertools.groupby()

itertools.groupby() returns an iterator of keys and groups. Note that these values are not displayed when using print().

l = [0, 0, 0, 1, 1, 2, 0, 0]
print(itertools.groupby(l))
# <itertools.groupby object at 0x110ab58b0>

The returned group is also an iterator. You can convert this into a list using list(), as shown below:

The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list: itertools.groupby() — Functions creating iterators for efficient looping — Python 3.11.3 documentation

for k, g in itertools.groupby(l):
    print(k, g)
# 0 <itertools._grouper object at 0x110a26940>
# 1 <itertools._grouper object at 0x110a2c400>
# 2 <itertools._grouper object at 0x110aa8f10>
# 0 <itertools._grouper object at 0x110aa8ee0>

for k, g in itertools.groupby(l):
    print(k, list(g))
# 0 [0, 0, 0]
# 1 [1, 1]
# 2 [2]
# 0 [0, 0]

You can use the list comprehensions to get a list of keys only, groups only, or both (tuples of key and group).

print([k for k, g in itertools.groupby(l)])
# [0, 1, 2, 0]

print([list(g) for k, g in itertools.groupby(l)])
# [[0, 0, 0], [1, 1], [2], [0, 0]]

print([(k, list(g)) for k, g in itertools.groupby(l)])
# [(0, [0, 0, 0]), (1, [1, 1]), (2, [2]), (0, [0, 0])]

Specify a function computing a key value for each element: key

You can specify the key parameter for itertools.groupby(). The key parameter is used in the same way as in other functions such as sorted(), max(), min(), and others.

The function (callable object) specified in key determines whether the values of consecutive elements are the same. For example, by specifying the built-in len() function, which returns the length of a string, you can group elements of the same length.

l = ['aaa', 'bbb', 'ccc', 'a', 'b', 'aa', 'bb']
print([(k, list(g)) for k, g in itertools.groupby(l, len)])
# [(3, ['aaa', 'bbb', 'ccc']), (1, ['a', 'b']), (2, ['aa', 'bb'])]

In the following example, a lambda expression is used to group by even or odd numbers.

l = [0, 2, 0, 3, 1, 4, 4, 0]
print([(k, list(g)) for k, g in itertools.groupby(l, lambda x: x % 2)])
# [(0, [0, 2, 0]), (1, [3, 1]), (0, [4, 4, 0])]

Aggregate like GROUP BY in SQL

For two-dimensional data, such as a list of lists, you can use key to group data based on a given column, similar to GROUP BY in SQL.

In the following example, a lambda expression is used to fetch the element at a desired position in the list. operator.itemgetter() can also be used for this purpose.

While a for loop is used here for readability, you can also use list comprehensions, as shown in previous examples.

l = [[0, 'Alice', 0],
     [1, 'Alice', 10],
     [2, 'Bob', 20],
     [3, 'Bob', 30],
     [4, 'Alice', 40]]

for k, g in itertools.groupby(l, lambda x: x[1]):
    print(k, list(g))
# Alice [[0, 'Alice', 0], [1, 'Alice', 10]]
# Bob [[2, 'Bob', 20], [3, 'Bob', 30]]
# Alice [[4, 'Alice', 40]]

itertools.groupby() groups only consecutive elements of the same value. To group elements regardless of their order, use sorted() to sort the original list.

When sorting a list of lists, the list is sorted by the first element of each list by default. To sort by the element at a given position, specify the key parameter of sorted().

for k, g in itertools.groupby(sorted(l, key=lambda x: x[1]), lambda x: x[1]):
    print(k, list(g))
# Alice [[0, 'Alice', 0], [1, 'Alice', 10], [4, 'Alice', 40]]
# Bob [[2, 'Bob', 20], [3, 'Bob', 30]]

You can sum numbers with a generator expression:

for k, g in itertools.groupby(sorted(l, key=lambda x: x[1]), lambda x: x[1]):
    print(k, sum(x[2] for x in g))
# Alice 50
# Bob 50

Note that the pandas library also offers groupby() for grouping and aggregation, which can be more convenient for handling complex data.

For tuples and strings

You can use itertools.groupby() to handle not only lists but also other iterable objects like tuples and strings.

For tuples:

t = (0, 0, 0, 1, 1, 2, 0, 0)
print([(k, list(g)) for k, g in itertools.groupby(t)])
# [(0, [0, 0, 0]), (1, [1, 1]), (2, [2]), (0, [0, 0])]

To convert a group into a tuple instead of a list, use tuple().

print(tuple((k, tuple(g)) for k, g in itertools.groupby(t)))
# ((0, (0, 0, 0)), (1, (1, 1)), (2, (2,)), (0, (0, 0)))

For strings:

s = 'aaabbcaa'
print([(k, list(g)) for k, g in itertools.groupby(s)])
# [('a', ['a', 'a', 'a']), ('b', ['b', 'b']), ('c', ['c']), ('a', ['a', 'a'])]

To convert a group into a string, use join().

print([(k, ''.join(g)) for k, g in itertools.groupby(s)])
# [('a', 'aaa'), ('b', 'bb'), ('c', 'c'), ('a', 'aa')]

Of course, you can also handle any other iterable object with itertools.groupby().

Related Categories

Related Articles