Extract and replace elements that meet the conditions of a list of strings in Python
In Python, list comprehensions allow you to create a new list from an existing list of strings by extracting, replacing, or transforming elements that satisfy certain conditions.
- List comprehensions
- Extract strings that contain or do not contain a specific substring
- Replace specific strings in a list
- Extract strings that begin or do not begin with a specific string
- Extract strings that end or do not end with a specific string
- Extract strings by case sensitivity
- Convert case of strings
- Extract strings by alphabetic or numeric
- Multiple conditions
- Regular expression (regex)
See the following article for more information on how to extract and replace strings.
- Extract a substring from a string in Python (position, regex)
- Replace strings in Python (replace, translate, re.sub, re.subn)
List comprehensions
List comprehensions offer a simpler alternative to the traditional for
loop when creating new lists.
[expression for variable_name in iterable if condition]
To extract elements that meet condition
, you don't need to process them with expression
; just use variable_name
.
[variable_name for variable_name in iterable if condition]
If you change if condition
to if not condition
, you can extract elements that do not satisfy condition
, i.e., exclude elements that satisfy condition
.
For more information about extracting, replacing, and converting list elements using list comprehensions, please refer to the following article.
Extract strings that contain or do not contain a specific substring
You can use the in
operator to check if a string contains a specific substring.
The syntax specific_string in target_string
evaluates to True
if the target_string
contains the specific_string
. For negation, you can use not in
.
l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']
l_in = [s for s in l if 'XXX' in s]
print(l_in)
# ['oneXXXaaa', 'twoXXXbbb']
l_in_not = [s for s in l if 'XXX' not in s]
print(l_in_not)
# ['three999aaa', '000111222']
Replace specific strings in a list
To replace a string within a list's elements, employ the replace()
method with list comprehension. If there's no matching string to be replaced, using replace()
won't result in any change. Hence, you don't need to filter elements with if condition
.
l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']
l_replace = [s.replace('XXX', 'ZZZ') for s in l]
print(l_replace)
# ['oneZZZaaa', 'twoZZZbbb', 'three999aaa', '000111222']
To replace the whole element containing a specific string, use the in
operator to extract it and apply conditional expressions (ternary operator), formatted as X if condition else Y
.
Use conditional expressions for the expression
part of list comprehensions.
l_replace_all = ['ZZZ' if 'XXX' in s else s for s in l]
print(l_replace_all)
# ['ZZZ', 'ZZZ', 'three999aaa', '000111222']
Parentheses can enhance code readability and reduce potential errors, although their usage is grammatically optional.
[('ZZZ' if ('XXX' in s) else s) for s in l]
Extract strings that begin or do not begin with a specific string
The startswith()
method returns True
if the string starts with the specific string.
l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']
l_start = [s for s in l if s.startswith('t')]
print(l_start)
# ['twoXXXbbb', 'three999aaa']
l_start_not = [s for s in l if not s.startswith('t')]
print(l_start_not)
# ['oneXXXaaa', '000111222']
Extract strings that end or do not end with a specific string
The endswith()
method returns True
if the string ends with the specific string.
l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']
l_end = [s for s in l if s.endswith('aaa')]
print(l_end)
# ['oneXXXaaa', 'three999aaa']
l_end_not = [s for s in l if not s.endswith('aaa')]
print(l_end_not)
# ['twoXXXbbb', '000111222']
Extract strings by case sensitivity
You can use the isupper()
and islower()
methods to check if a string is entirely in uppercase or lowercase.
l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']
l_lower = [s for s in l if s.islower()]
print(l_lower)
# ['three999aaa']
Convert case of strings
To convert all characters of a string to either uppercase or lowercase, use the upper()
or lower()
methods. Python also provides other methods, such as capitalize()
to make the first letter uppercase, and swapcase()
to invert the case of all characters in a string.
Use conditional expressions to convert only those elements that satisfy the conditions.
l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']
l_upper_all = [s.upper() for s in l]
print(l_upper_all)
# ['ONEXXXAAA', 'TWOXXXBBB', 'THREE999AAA', '000111222']
l_lower_to_upper = [s.upper() if s.islower() else s for s in l]
print(l_lower_to_upper)
# ['oneXXXaaa', 'twoXXXbbb', 'THREE999AAA', '000111222']
Extract strings by alphabetic or numeric
You can use the isalpha()
and isnumeric()
methods to check if a string is all alphabetic or all numeric.
l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']
l_isalpha = [s for s in l if s.isalpha()]
print(l_isalpha)
# ['oneXXXaaa', 'twoXXXbbb']
l_isnumeric = [s for s in l if s.isnumeric()]
print(l_isnumeric)
# ['000111222']
Multiple conditions
In the condition
part of the list comprehension, you can specify multiple conditions using and
, or
, and not
.
When working with more than three conditions, it's safer to enclose each group with ()
to ensure accurate results, as order of execution may impact the outcome.
l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']
l_multi = [s for s in l if s.isalpha() and not s.startswith('t')]
print(l_multi)
# ['oneXXXaaa']
l_multi_or = [s for s in l if (s.isalpha() and not s.startswith('t')) or ('bbb' in s)]
print(l_multi_or)
# ['oneXXXaaa', 'twoXXXbbb']
Regular expression (regex)
You can use regular expressions (regex) for more flexible pattern matching and manipulation.
The re.match()
function returns a match
object if a match is found and None if not.
Since match
objects are evaluated as True
and None
as False
, if you want to extract elements that match a regex pattern, you should apply re.match()
to the condition
part of the list comprehensions as in the previous examples.
import re
l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']
l_re_match = [s for s in l if re.match('.*XXX.*', s)]
print(l_re_match)
# ['oneXXXaaa', 'twoXXXbbb']
You can also use re.sub()
to replace parts that match a regex pattern. If you want to extract and replace only matched elements, add if condition
.
l_re_sub_all = [re.sub('(.*)XXX(.*)', r'\2---\1', s) for s in l]
print(l_re_sub_all)
# ['aaa---one', 'bbb---two', 'three999aaa', '000111222']
l_re_sub = [re.sub('(.*)XXX(.*)', r'\2---\1', s) for s in l if re.match('.*XXX.*', s)]
print(l_re_sub)
# ['aaa---one', 'bbb---two']