Extract and replace elements that meet the conditions of a list of strings in Python

Modified: | Tags: Python, String, List

In Python, list comprehensions allow you to create a new list from an existing list of strings by extracting, replacing, or transforming elements that satisfy certain conditions.

See the following article for more information on how to extract and replace strings.

List comprehensions

List comprehensions offer a simpler alternative to the traditional for loop when creating new lists.

[expression for variable_name in iterable if condition]

To extract elements that meet condition, you don't need to process them with expression; just use variable_name.

[variable_name for variable_name in iterable if condition]

If you change if condition to if not condition, you can extract elements that do not satisfy condition, i.e., exclude elements that satisfy condition.

For more information about extracting, replacing, and converting list elements using list comprehensions, please refer to the following article.

Extract strings that contain or do not contain a specific substring

You can use the in operator to check if a string contains a specific substring.

The syntax specific_string in target_string evaluates to True if the target_string contains the specific_string. For negation, you can use not in.

l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']

l_in = [s for s in l if 'XXX' in s]
print(l_in)
# ['oneXXXaaa', 'twoXXXbbb']

l_in_not = [s for s in l if 'XXX' not in s]
print(l_in_not)
# ['three999aaa', '000111222']

Replace specific strings in a list

To replace a string within a list's elements, employ the replace() method with list comprehension. If there's no matching string to be replaced, using replace() won't result in any change. Hence, you don't need to filter elements with if condition.

l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']

l_replace = [s.replace('XXX', 'ZZZ') for s in l]
print(l_replace)
# ['oneZZZaaa', 'twoZZZbbb', 'three999aaa', '000111222']

To replace the whole element containing a specific string, use the in operator to extract it and apply conditional expressions (ternary operator), formatted as X if condition else Y.

Use conditional expressions for the expression part of list comprehensions.

l_replace_all = ['ZZZ' if 'XXX' in s else s for s in l]
print(l_replace_all)
# ['ZZZ', 'ZZZ', 'three999aaa', '000111222']

Parentheses can enhance code readability and reduce potential errors, although their usage is grammatically optional.

[('ZZZ' if ('XXX' in s) else s) for s in l]

Extract strings that begin or do not begin with a specific string

The startswith() method returns True if the string starts with the specific string.

l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']

l_start = [s for s in l if s.startswith('t')]
print(l_start)
# ['twoXXXbbb', 'three999aaa']

l_start_not = [s for s in l if not s.startswith('t')]
print(l_start_not)
# ['oneXXXaaa', '000111222']

Extract strings that end or do not end with a specific string

The endswith() method returns True if the string ends with the specific string.

l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']

l_end = [s for s in l if s.endswith('aaa')]
print(l_end)
# ['oneXXXaaa', 'three999aaa']

l_end_not = [s for s in l if not s.endswith('aaa')]
print(l_end_not)
# ['twoXXXbbb', '000111222']

Extract strings by case sensitivity

You can use the isupper() and islower() methods to check if a string is entirely in uppercase or lowercase.

l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']

l_lower = [s for s in l if s.islower()]
print(l_lower)
# ['three999aaa']

Convert case of strings

To convert all characters of a string to either uppercase or lowercase, use the upper() or lower() methods. Python also provides other methods, such as capitalize() to make the first letter uppercase, and swapcase() to invert the case of all characters in a string.

Use conditional expressions to convert only those elements that satisfy the conditions.

l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']

l_upper_all = [s.upper() for s in l]
print(l_upper_all)
# ['ONEXXXAAA', 'TWOXXXBBB', 'THREE999AAA', '000111222']

l_lower_to_upper = [s.upper() if s.islower() else s for s in l]
print(l_lower_to_upper)
# ['oneXXXaaa', 'twoXXXbbb', 'THREE999AAA', '000111222']

Extract strings by alphabetic or numeric

You can use the isalpha() and isnumeric() methods to check if a string is all alphabetic or all numeric.

l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']

l_isalpha = [s for s in l if s.isalpha()]
print(l_isalpha)
# ['oneXXXaaa', 'twoXXXbbb']

l_isnumeric = [s for s in l if s.isnumeric()]
print(l_isnumeric)
# ['000111222']

Multiple conditions

In the condition part of the list comprehension, you can specify multiple conditions using and, or, and not.

When working with more than three conditions, it's safer to enclose each group with () to ensure accurate results, as order of execution may impact the outcome.

l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']

l_multi = [s for s in l if s.isalpha() and not s.startswith('t')]
print(l_multi)
# ['oneXXXaaa']

l_multi_or = [s for s in l if (s.isalpha() and not s.startswith('t')) or ('bbb' in s)]
print(l_multi_or)
# ['oneXXXaaa', 'twoXXXbbb']

Regular expression (regex)

You can use regular expressions (regex) for more flexible pattern matching and manipulation.

The re.match() function returns a match object if a match is found and None if not.

Since match objects are evaluated as True and None as False, if you want to extract elements that match a regex pattern, you should apply re.match() to the condition part of the list comprehensions as in the previous examples.

import re

l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']

l_re_match = [s for s in l if re.match('.*XXX.*', s)]
print(l_re_match)
# ['oneXXXaaa', 'twoXXXbbb']

You can also use re.sub() to replace parts that match a regex pattern. If you want to extract and replace only matched elements, add if condition.

l_re_sub_all = [re.sub('(.*)XXX(.*)', r'\2---\1', s) for s in l]
print(l_re_sub_all)
# ['aaa---one', 'bbb---two', 'three999aaa', '000111222']

l_re_sub = [re.sub('(.*)XXX(.*)', r'\2---\1', s) for s in l if re.match('.*XXX.*', s)]
print(l_re_sub)
# ['aaa---one', 'bbb---two']

Related Categories

Related Articles