Extract and Replace Elements That Meet the Conditions of a List of Strings in Python

Modified: 2023-05-19 | Tags: Python, String, List

In Python, list comprehensions allow you to create a new list from an existing list of strings by extracting, replacing, or transforming elements that satisfy certain conditions.

Contents

List comprehensions
Extract strings that contain or do not contain a specific substring
Replace specific strings in a list
Extract strings that begin or do not begin with a specific string
Extract strings that end or do not end with a specific string
Extract strings by case sensitivity
Convert case of strings
Extract strings by alphabetic or numeric
Multiple conditions
Regular expression (regex)

See the following article for more information on how to extract and replace strings.

List comprehensions

List comprehensions offer a simpler alternative to the traditional for loop when creating new lists.

List comprehensions in Python

[expression for variable_name in iterable if condition]

To extract elements that meet condition, you don't need to process them with expression; just use variable_name.

[variable_name for variable_name in iterable if condition]

If you change if condition to if not condition, you can extract elements that do not satisfy condition, i.e., exclude elements that satisfy condition.

For more information about extracting, replacing, and converting list elements using list comprehensions, please refer to the following article.

Extract, replace, convert elements of a list in Python

Extract strings that contain or do not contain a specific substring

You can use the in operator to check if a string contains a specific substring.

The in operator in Python (for list, string, dictionary, etc.)

The syntax specific_string in target_string evaluates to True if the target_string contains the specific_string. For negation, you can use not in.

l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']

l_in = [s for s in l if 'XXX' in s]
print(l_in)
# ['oneXXXaaa', 'twoXXXbbb']

l_in_not = [s for s in l if 'XXX' not in s]
print(l_in_not)
# ['three999aaa', '000111222']

source: list_str_select_replace.py

Replace specific strings in a list

To replace a string within a list's elements, employ the replace() method with list comprehension. If there's no matching string to be replaced, using replace() won't result in any change. Hence, you don't need to filter elements with if condition.

l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']

l_replace = [s.replace('XXX', 'ZZZ') for s in l]
print(l_replace)
# ['oneZZZaaa', 'twoZZZbbb', 'three999aaa', '000111222']

source: list_str_select_replace.py

To replace the whole element containing a specific string, use the in operator to extract it and apply conditional expressions (ternary operator), formatted as X if condition else Y.

Conditional expressions in Python

Use conditional expressions for the expression part of list comprehensions.

Extract, replace, convert elements of a list in Python

l_replace_all = ['ZZZ' if 'XXX' in s else s for s in l]
print(l_replace_all)
# ['ZZZ', 'ZZZ', 'three999aaa', '000111222']

source: list_str_select_replace.py

Parentheses can enhance code readability and reduce potential errors, although their usage is grammatically optional.

[('ZZZ' if ('XXX' in s) else s) for s in l]

Extract strings that begin or do not begin with a specific string

The startswith() method returns True if the string starts with the specific string.

l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']

l_start = [s for s in l if s.startswith('t')]
print(l_start)
# ['twoXXXbbb', 'three999aaa']

l_start_not = [s for s in l if not s.startswith('t')]
print(l_start_not)
# ['oneXXXaaa', '000111222']

source: list_str_select_replace.py

Extract strings that end or do not end with a specific string

The endswith() method returns True if the string ends with the specific string.

l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']

l_end = [s for s in l if s.endswith('aaa')]
print(l_end)
# ['oneXXXaaa', 'three999aaa']

l_end_not = [s for s in l if not s.endswith('aaa')]
print(l_end_not)
# ['twoXXXbbb', '000111222']

source: list_str_select_replace.py

Extract strings by case sensitivity

You can use the isupper() and islower() methods to check if a string is entirely in uppercase or lowercase.

Convert and determine uppercase and lowercase strings in Python

l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']

l_lower = [s for s in l if s.islower()]
print(l_lower)
# ['three999aaa']

source: list_str_select_replace.py

Convert case of strings

To convert all characters of a string to either uppercase or lowercase, use the upper() or lower() methods. Python also provides other methods, such as capitalize() to make the first letter uppercase, and swapcase() to invert the case of all characters in a string.

Use conditional expressions to convert only those elements that satisfy the conditions.

l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']

l_upper_all = [s.upper() for s in l]
print(l_upper_all)
# ['ONEXXXAAA', 'TWOXXXBBB', 'THREE999AAA', '000111222']

l_lower_to_upper = [s.upper() if s.islower() else s for s in l]
print(l_lower_to_upper)
# ['oneXXXaaa', 'twoXXXbbb', 'THREE999AAA', '000111222']

source: list_str_select_replace.py

Extract strings by alphabetic or numeric

You can use the isalpha() and isnumeric() methods to check if a string is all alphabetic or all numeric.

Check if a string is numeric, alphabetic, alphanumeric, or ASCII

l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']

l_isalpha = [s for s in l if s.isalpha()]
print(l_isalpha)
# ['oneXXXaaa', 'twoXXXbbb']

l_isnumeric = [s for s in l if s.isnumeric()]
print(l_isnumeric)
# ['000111222']

source: list_str_select_replace.py

Multiple conditions

In the condition part of the list comprehension, you can specify multiple conditions using and, or, and not.

When working with more than three conditions, it's safer to enclose each group with () to ensure accurate results, as order of execution may impact the outcome.

l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']

l_multi = [s for s in l if s.isalpha() and not s.startswith('t')]
print(l_multi)
# ['oneXXXaaa']

l_multi_or = [s for s in l if (s.isalpha() and not s.startswith('t')) or ('bbb' in s)]
print(l_multi_or)
# ['oneXXXaaa', 'twoXXXbbb']

source: list_str_select_replace.py

Regular expression (regex)

You can use regular expressions (regex) for more flexible pattern matching and manipulation.

Regular expressions with the re module in Python

The re.match() function returns a match object if a match is found and None if not.

Since match objects are evaluated as True and None as False, if you want to extract elements that match a regex pattern, you should apply re.match() to the condition part of the list comprehensions as in the previous examples.

import re

l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']

l_re_match = [s for s in l if re.match('.*XXX.*', s)]
print(l_re_match)
# ['oneXXXaaa', 'twoXXXbbb']

source: list_str_re.py

You can also use re.sub() to replace parts that match a regex pattern. If you want to extract and replace only matched elements, add if condition.

l_re_sub_all = [re.sub('(.*)XXX(.*)', r'\2---\1', s) for s in l]
print(l_re_sub_all)
# ['aaa---one', 'bbb---two', 'three999aaa', '000111222']

l_re_sub = [re.sub('(.*)XXX(.*)', r'\2---\1', s) for s in l if re.match('.*XXX.*', s)]
print(l_re_sub)
# ['aaa---one', 'bbb---two']

source: list_str_re.py

Extract and Replace Elements That Meet the Conditions of a List of Strings in Python

List comprehensions

Extract strings that contain or do not contain a specific substring

Replace specific strings in a list

Extract strings that begin or do not begin with a specific string

Extract strings that end or do not end with a specific string

Extract strings by case sensitivity

Convert case of strings

Extract strings by alphabetic or numeric

Multiple conditions

Regular expression (regex)

Related Categories

Related Articles