note.nkmk.me

Sort a list of dictionaries by the value of the specific key in Python

Posted: 2021-09-15 / Tags: Python, List, Dictionary

In Python, sorting a list of dictionaries with the sort() method or the sorted() function raises the error TypeError by default.

By specifying the key parameter of sort() or sorted(), you can sort a list of dictionaries according to the value of the specific key.

This article describes the following contents.

  • Sorting a list of dictionaries raises an error by default
  • Specify lambda expressions for the key parameter
  • Specify operator.itemgetter() for the key parameter
  • Sort by multiple keys
  • max(), min() for a list of dictionaries

The following sample codes use a list of dictionaries with common keys. The pprint module is used to make the output easier to read.

import pprint

l = [{'Name': 'Alice', 'Age': 40, 'Point': 80},
     {'Name': 'Bob', 'Age': 20},
     {'Name': 'Charlie', 'Age': 30, 'Point': 70}]
Sponsored Link

Sorting a list of dictionaries raises an error by default

Sorting a list of dictionaries (dict) with the sort() method or the sorted() function raises the error TypeError by default.

This is because the dictionary does not support comparisons with <, >, etc.

# sorted(l)
# TypeError: '<' not supported between instances of 'dict' and 'dict'

Specify lambda expressions for the key parameter

To sort a list of dictionaries according to the value of the specific key, specify the key parameter of the sort() method or the sorted() function.

By specifying a function to be applied to each element of the list, it is sorted according to the result of that function.

In this example, you can specify a function to get the value of a specific key from the dictionary.

You can define a function with def, but it is convenient to use lambda expressions in such a case.

pprint.pprint(sorted(l, key=lambda x: x['Age']))
# [{'Age': 20, 'Name': 'Bob'},
#  {'Age': 30, 'Name': 'Charlie', 'Point': 70},
#  {'Age': 40, 'Name': 'Alice', 'Point': 80}]

pprint.pprint(sorted(l, key=lambda x: x['Name']))
# [{'Age': 40, 'Name': 'Alice', 'Point': 80},
#  {'Age': 20, 'Name': 'Bob'},
#  {'Age': 30, 'Name': 'Charlie', 'Point': 70}]

Specify whether to sort in descending or ascending order with the reverse parameter.

pprint.pprint(sorted(l, key=lambda x: x['Age'], reverse=True))
# [{'Age': 40, 'Name': 'Alice', 'Point': 80},
#  {'Age': 30, 'Name': 'Charlie', 'Point': 70},
#  {'Age': 20, 'Name': 'Bob'}]

The examples so far use sorted(), but you can specify key and reverse in the same way with the sort() method of list.

For the difference between sort() and sorted(), see the following article. sort() sorts the original object itself, and sorted() creates a new sorted object.

When the specified key does not exist

With the way shown above, an error raises if the specified key does not exist.

# sorted(l, key=lambda x: x['Point'])
# KeyError: 'Point'

In such a case, use the get() method of dict, which returns the default value for non-existent keys.

By default, get() returns None for non-existent keys. None is not comparable to a number or a string, so an error raises.

# sorted(l, key=lambda x: x.get('Point'))
# TypeError: '<' not supported between instances of 'int' and 'NoneType'

You can specify a value for a key that does not exist as the second argument of get(). Elements whose keys do not exist are replaced with the value specified in the second argument and sorted.

pprint.pprint(sorted(l, key=lambda x: x.get('Point', 75)))
# [{'Age': 30, 'Name': 'Charlie', 'Point': 70},
#  {'Age': 20, 'Name': 'Bob'},
#  {'Age': 40, 'Name': 'Alice', 'Point': 80}]

Infinity inf is determined to be greater than any other number, so you can use inf and -inf to always place elements with no key at the end or beginning.

pprint.pprint(sorted(l, key=lambda x: x.get('Point', float('inf'))))
# [{'Age': 30, 'Name': 'Charlie', 'Point': 70},
#  {'Age': 40, 'Name': 'Alice', 'Point': 80},
#  {'Age': 20, 'Name': 'Bob'}]

pprint.pprint(sorted(l, key=lambda x: x.get('Point', -float('inf'))))
# [{'Age': 20, 'Name': 'Bob'},
#  {'Age': 30, 'Name': 'Charlie', 'Point': 70},
#  {'Age': 40, 'Name': 'Alice', 'Point': 80}]

Specify operator.itemgetter() for the key parameter

You can also use itemgetter() of the operator module of the standard library. It is faster than using lambda expression.

import operator

pprint.pprint(sorted(l, key=operator.itemgetter('Age')))
# [{'Age': 20, 'Name': 'Bob'},
#  {'Age': 30, 'Name': 'Charlie', 'Point': 70},
#  {'Age': 40, 'Name': 'Alice', 'Point': 80}]

pprint.pprint(sorted(l, key=operator.itemgetter('Name')))
# [{'Age': 40, 'Name': 'Alice', 'Point': 80},
#  {'Age': 20, 'Name': 'Bob'},
#  {'Age': 30, 'Name': 'Charlie', 'Point': 70}]

If the specified key does not exist, an error occurs.

# sorted(l, key=operator.itemgetter('Point'))
# KeyError: 'Point'
Sponsored Link

Sort by multiple keys

The following is an example of the case where there are dictionaries with the same value for a common key. There are two dictionaries that have the value 'CA' for the key 'State'.

l_dup = [{'Name': 'Alice', 'Age': 40, 'Point': 80, 'State': 'CA'},
         {'Name': 'Bob', 'Age': 20, 'State': 'NY'},
         {'Name': 'Charlie', 'Age': 30, 'Point': 70, 'State': 'CA'}]

If the values are equal, the original order is preserved.

pprint.pprint(sorted(l_dup, key=operator.itemgetter('State')))
# [{'Age': 40, 'Name': 'Alice', 'Point': 80, 'State': 'CA'},
#  {'Age': 30, 'Name': 'Charlie', 'Point': 70, 'State': 'CA'},
#  {'Age': 20, 'Name': 'Bob', 'State': 'NY'}]

You can specify multiple arguments for operator.itemgetter(), and if the values for the first key are equal, they will be compared and sorted by the value of the next key.

pprint.pprint(sorted(l_dup, key=operator.itemgetter('State', 'Age')))
# [{'Age': 30, 'Name': 'Charlie', 'Point': 70, 'State': 'CA'},
#  {'Age': 40, 'Name': 'Alice', 'Point': 80, 'State': 'CA'},
#  {'Age': 20, 'Name': 'Bob', 'State': 'NY'}]

Note that if the order of the arguments is different, the result is also different.

pprint.pprint(sorted(l_dup, key=operator.itemgetter('Age', 'State')))
# [{'Age': 20, 'Name': 'Bob', 'State': 'NY'},
#  {'Age': 30, 'Name': 'Charlie', 'Point': 70, 'State': 'CA'},
#  {'Age': 40, 'Name': 'Alice', 'Point': 80, 'State': 'CA'}]

The same can be done with lambda expressions returning multiple values as tuples or lists.

pprint.pprint(sorted(l_dup, key=lambda x: (x['State'], x['Age'])))
# [{'Age': 30, 'Name': 'Charlie', 'Point': 70, 'State': 'CA'},
#  {'Age': 40, 'Name': 'Alice', 'Point': 80, 'State': 'CA'},
#  {'Age': 20, 'Name': 'Bob', 'State': 'NY'}]

max(), min() for a list of dictionaries

As mentioned above, comparisons with < or > are not supported for dictionaries dict, so passing a list of dictionaries to max() or min() causes an error.

# max(l)
# TypeError: '>' not supported between instances of 'dict' and 'dict'

As with sorted() and sort(), you can specify the key parameter in max() and min() as well.

print(max(l, key=lambda x: x['Age']))
# {'Name': 'Alice', 'Age': 40, 'Point': 80}

print(min(l, key=lambda x: x['Age']))
# {'Name': 'Bob', 'Age': 20}

The dictionary dict is returned, so if you want to get a value, specify a key.

print(max(l, key=lambda x: x['Age'])['Age'])
# 40

Of course, you can also use operator.itemgetter().

print(max(l, key=operator.itemgetter('Age')))
# {'Name': 'Alice', 'Age': 40, 'Point': 80}
Sponsored Link
Share

Related Categories

Related Articles