note.nkmk.me

Extract and replace elements that meet the conditions of a list of strings in Python

Posted: 2021-09-05 / Tags: Python, String, List

In Python, to generate a new list from a list of strings by extracting, replacing or transforming elements that satisfy certain conditions, use list comprehensions.

This article briefly explains the list comprehensions, and then describes the following with sample code.

  • Extract strings that contain or do not contain a specific string
  • Replace a specific string in a list
  • Extract strings that begin or do not begin with a specific string
  • Extract strings that end or do not end with a specific string
  • Extract strings by uppercase or lowercase
  • Convert uppercase and lowercase
  • Extract strings by alphabetic or numeric
  • Multiple conditions
  • Regular expressions

See the following article for more information on how to replace strings.

Sponsored Link

List comprehensions

To generate a new list from a list, you can use list comprehensions, which is simpler to write than for loop.

[expression for variable_name in iterable if condition]

If you just want to extract elements that meet condition, you don't need to process them with expression, just use variable_name.

[variable_name for variable_name in iterable if condition]

If you change if condition to if not condition, you can extract elements that do not satisfy condition.

For more information about extracting, replacing, and converting list elements using list comprehensions, please refer to the following article.

Extract strings that contain or do not contain a specific string

specific_string in target_string returns True if the target_string contains a specific_string. Use not in for negation.

l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']

l_in = [s for s in l if 'XXX' in s]
print(l_in)
# ['oneXXXaaa', 'twoXXXbbb']

l_in_not = [s for s in l if 'XXX' not in s]
print(l_in_not)
# ['three999aaa', '000111222']

Replace a specific string in a list

If you want to replace the string of elements of a list, use the string method replace() for each element with the list comprehension.

If there is no string to be replaced, applying replace() will not change it, so you don't need to select an element with if condition.

l_replace = [s.replace('XXX', 'ZZZ') for s in l]
print(l_replace)
# ['oneZZZaaa', 'twoZZZbbb', 'three999aaa', '000111222']

To replace an entire element containing a specific string, extract it with in and use conditional expressions (ternary operator), X if condition else Y.

Use conditional expressions for the expression part of list comprehensions.

l_replace_all = ['ZZZ' if 'XXX' in s else s for s in l]
print(l_replace_all)
# ['ZZZ', 'ZZZ', 'three999aaa', '000111222']

It may be easier to understand and avoid mistakes with parentheses. Grammatically, it doesn't matter if there are parentheses or not.

[('ZZZ' if ('XXX' in s) else s) for s in l]

Extract strings that begin or do not begin with a specific string

The string method startswith() returns True if the string starts with the specific string.

l_start = [s for s in l if s.startswith('t')]
print(l_start)
# ['twoXXXbbb', 'three999aaa']

l_start_not = [s for s in l if not s.startswith('t')]
print(l_start_not)
# ['oneXXXaaa', '000111222']

Extract strings that end or do not end with a specific string

The string method endswith() returns True if the string ends with the specific string.

l_end = [s for s in l if s.endswith('aaa')]
print(l_end)
# ['oneXXXaaa', 'three999aaa']

l_end_not = [s for s in l if not s.endswith('aaa')]
print(l_end_not)
# ['twoXXXbbb', '000111222']
Sponsored Link

Extract strings by uppercase or lowercase

You can use the string methods isupper(), islower() to determine whether a string is all uppercase or all lowercase.

l_lower = [s for s in l if s.islower()]
print(l_lower)
# ['three999aaa']

Convert uppercase and lowercase

If you want to convert all letters to uppercase or lowercase, use the string methods upper() or lower(). Other methods such as capitalize() to capitalize the first letter, swapcase() to swap upper and lower case are also provided.

Use conditional expressions to convert only those elements that satisfy the conditions.

l_upper_all = [s.upper() for s in l]
print(l_upper_all)
# ['ONEXXXAAA', 'TWOXXXBBB', 'THREE999AAA', '000111222']

l_lower_to_upper = [s.upper() if s.islower() else s for s in l]
print(l_lower_to_upper)
# ['oneXXXaaa', 'twoXXXbbb', 'THREE999AAA', '000111222']

Extract strings by alphabetic or numeric

You can use the string methods isalpha() and isnumeric() to determine whether a string is all alphabetic or all numeric.

l_isalpha = [s for s in l if s.isalpha()]
print(l_isalpha)
# ['oneXXXaaa', 'twoXXXbbb']

l_isnumeric = [s for s in l if s.isnumeric()]
print(l_isnumeric)
# ['000111222']

Multiple conditions

You can also specify multiple conditions using and and or in the condition part of the list comprehensions. You can also use negation not.

If you use more than three conditions, it is safer to enclose each group with (), since the results will differ depending on the order.

l_multi = [s for s in l if s.isalpha() and not s.startswith('t')]
print(l_multi)
# ['oneXXXaaa']

l_multi_or = [s for s in l if (s.isalpha() and not s.startswith('t')) or ('bbb' in s)]
print(l_multi_or)
# ['oneXXXaaa', 'twoXXXbbb']

Regular expressions

Regular expressions can be used for more flexible processing.

re.match() returns a match object if it matches, or None if it does not match.

Since match objects are evaluated as True and None as False, if you want to extract only the elements that match a regular expression, you should apply re.match() to the condition part of the list comprehensions as in the previous examples.

import re

l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']

l_re_match = [s for s in l if re.match('.*XXX.*', s)]
print(l_re_match)
# ['oneXXXaaa', 'twoXXXbbb']

You can also use re.sub() to replace the part that matches a regular expression. If you want to extract and replace only matched elements, add if condition.

l_re_sub_all = [re.sub('(.*)XXX(.*)', r'\2---\1', s) for s in l]
print(l_re_sub_all)
# ['aaa---one', 'bbb---two', 'three999aaa', '000111222']

l_re_sub = [re.sub('(.*)XXX(.*)', r'\2---\1', s) for s in l if re.match('.*XXX.*', s)]
print(l_re_sub)
# ['aaa---one', 'bbb---two']
Sponsored Link
Share

Related Categories

Related Articles