Sort a List of Numeric Strings in Python
In Python, you can sort a list with the sort()
method or the sorted()
function.
This article explains how to sort a list of numeric strings and a list of strings containing numbers.
sort()
and sorted()
sort()
is a list method that sorts the original list itself.
l = [10, 1, 5]
l.sort()
print(l)
# [1, 5, 10]
sorted()
is a built-in function that creates a new sorted list. The original list remains unchanged.
l = [10, 1, 5]
print(sorted(l))
# [1, 5, 10]
print(l)
# [10, 1, 5]
By default, sorting is done in ascending order. To sort in descending order, set the reverse
parameter to True
. The example uses sorted()
, but the same applies to sort()
.
print(sorted(l, reverse=True))
# [10, 5, 1]
For more information, including how to sort tuples and strings, see the following article.
Sort a list of numeric strings
Notes on numeric strings that are not zero-padded
A list of zero-padded numeric strings is sorted without any issues. Note that the following sample code uses sorted()
, but sort()
works similarly.
l = ['10', '01', '05']
print(sorted(l))
# ['01', '05', '10']
A list of numeric strings that are not zero-padded is sorted alphabetically, not numerically. For example, '10'
is considered smaller than '5'
.
l = ['10', '1', '5']
print(sorted(l))
# ['1', '10', '5']
Specify int()
or float()
as the key
argument
In both sort()
and sorted()
, specifying a function in the key
argument sorts the list based on the result of that function.
Specifying int()
or float()
as the key
argument sorts the strings numerically.
When specifying a function as an argument, do not append ()
to the function name to avoid an error.
l = ['10', '1', '5']
print(sorted(l, key=int))
# ['1', '5', '10']
print(sorted(l, key=float))
# ['1', '5', '10']
Integer strings can be converted using either int()
or float()
, but decimals require float()
.
l = ['10.0', '1.0', '5.0']
print(sorted(l, key=float))
# ['1.0', '5.0', '10.0']
The key
argument can also be specified in sort()
.
l = ['10', '1', '5']
l.sort(key=int)
print(l)
# ['1', '5', '10']
The function specified in key
is used only for sorting comparison; the elements retain their original form, but their order is sorted.
To get results in int
or float
, sort the converted list using list comprehension.
l = ['10', '1', '5']
print([int(s) for s in l])
# [10, 1, 5]
print(sorted([int(s) for s in l]))
# [1, 5, 10]
Sort a list of strings containing numbers
Extract numbers from strings using regular expressions (regex)
For purely numeric strings, simply specify int()
or float()
as the key
argument.
However, for strings with embedded numbers, you must use the regular expression (regex) module re
to extract the numeric part of the string.
l = ['file10.txt', 'file1.txt', 'file5.txt']
Only one number in the string
Use search()
to obtain a match
object, and extract the matched part as a string with the group()
method.
Use the regex pattern \d+
, where \d
represents a digit and +
indicates one or more occurrences, matching a series of digits.
import re
s = 'file5.txt'
print(re.search(r'\d+', s).group())
# 5
Raw strings are used here to write the backslash \
directly.
Since a string is returned, use int()
or float()
to convert it to a number.
print(type(re.search(r'\d+', s).group()))
# <class 'str'>
print(type(int(re.search(r'\d+', s).group())))
# <class 'int'>
You can specify this process for the key
parameter in sort()
or sorted()
using a lambda expression.
l = ['file10.txt', 'file1.txt', 'file5.txt']
print(sorted(l))
# ['file1.txt', 'file10.txt', 'file5.txt']
print(sorted(l, key=lambda s: int(re.search(r'\d+', s).group())))
# ['file1.txt', 'file5.txt', 'file10.txt']
If the number of elements is small, you do not have to worry too much, but it is more efficient to use a regex object compiled with compile()
.
p = re.compile(r'\d+')
print(sorted(l, key=lambda s: int(p.search(s).group())))
# ['file1.txt', 'file5.txt', 'file10.txt']
Multiple numbers in the string
search()
returns only the first matched part.
s = '100file5.txt'
print(re.search(r'\d+', s).group())
# 100
findall()
returns all matching parts as a list.
print(re.findall(r'\d+', s))
# ['100', '5']
print(re.findall(r'\d+', s)[1])
# 5
Enclosing parts of the pattern in ()
allows you to extract specific parts with the groups()
method.
For example, the pattern file(\d+)
extracts numeric sequences from strings like 'file123'
. Note that it returns a tuple even if there is only one matched part.
print(re.search(r'file(\d+)', s).groups())
# ('5',)
print(re.search(r'file(\d+)', s).groups()[0])
# 5
The pattern (\d+)\.
extracts numeric sequences from strings like '123.'
. A backslash is needed before the period .
.
print(re.search(r'(\d+)\.', s).groups()[0])
# 5
Examples:
l = ['100file10.txt', '100file1.txt', '100file5.txt']
print(sorted(l, key=lambda s: int(re.findall(r'\d+', s)[1])))
# ['100file1.txt', '100file5.txt', '100file10.txt']
print(sorted(l, key=lambda s: int(re.search(r'file(\d+)', s).groups()[0])))
# ['100file1.txt', '100file5.txt', '100file10.txt']
print(sorted(l, key=lambda s: int(re.search(r'(\d+)\.', s).groups()[0])))
# ['100file1.txt', '100file5.txt', '100file10.txt']
p = re.compile(r'file(\d+)')
print(sorted(l, key=lambda s: int(p.search(s).groups()[0])))
# ['100file1.txt', '100file5.txt', '100file10.txt']
Some strings do not contain numbers
If not all strings contain numbers, handle cases with no numeric match carefully.
l = ['file10.txt', 'file1.txt', 'file5.txt', 'file.txt']
# print(sorted(l, key=lambda s:int(re.search(r'\d+', s).group())))
# AttributeError: 'NoneType' object has no attribute 'group'
For example, define a function as follows: the first argument is the string to be processed, the second is the regex object, and the third is the return value for non-matches.
def extract_num(s, p, ret=0):
search = p.search(s)
if search:
return int(search.groups()[0])
else:
return ret
The results are as follows. The pattern requires ()
to capture specific groups, which are then accessed using the groups()
method.
p = re.compile(r'(\d+)')
print(extract_num('file10.txt', p))
# 10
print(extract_num('file.txt', p))
# 0
print(extract_num('file.txt', p, 100))
# 100
The third argument is optional.
Specify this function for the key
argument in sort()
or sorted()
.
print(sorted(l, key=lambda s: extract_num(s, p)))
# ['file.txt', 'file1.txt', 'file5.txt', 'file10.txt']
print(sorted(l, key=lambda s: extract_num(s, p, float('inf'))))
# ['file1.txt', 'file5.txt', 'file10.txt', 'file.txt']
To place strings without numbers at the end in ascending order, you can use a value of infinity (inf
).
For cases with multiple numbers, use the appropriate regex pattern.
l = ['100file10.txt', '100file1.txt', '100file5.txt', '100file.txt']
p = re.compile(r'file(\d+)')
print(sorted(l, key=lambda s: extract_num(s, p)))
# ['100file.txt', '100file1.txt', '100file5.txt', '100file10.txt']
print(sorted(l, key=lambda s: extract_num(s, p, float('inf'))))
# ['100file1.txt', '100file5.txt', '100file10.txt', '100file.txt']