pandas: Handle strings (replace, strip, case conversion, etc.)

Posted: 2022-04-23 | Tags: Python, pandas

You can use various methods with the string accessor (str.xxx()) to handle (replace, strip, etc.) strings of pandas.Series (= a column or row of pandas.DataFrame).

Series - String handling — pandas 1.4.0 documentation

For example, the following methods are available. You can apply the same methods for standard Python strings (str) to all elements of pandas.Series.

Replace each string in pandas.Series
- str.replace()
Strip each string in pandas.Series
- str.strip()
- str.lstrip()
- str.rstrip()
Convert the case of each string in pandas.Series
- str.lower()
- str.upper()
- str.capitalize()
- str.title()

See also the following articles for other string methods.

Replace each string in `pandas.Series`

`str.replace()`

import pandas as pd

s = pd.Series([' a-a-x ', ' b-x-b ', ' x-c-c '])
print(s)
# 0     a-a-x 
# 1     b-x-b 
# 2     x-c-c 
# dtype: object

s_new = s.str.replace('x', 'z')
print(s_new)
# 0     a-a-z 
# 1     b-z-b 
# 2     z-c-c 
# dtype: object

source: pandas_str_replace_strip_etc.py

To update a column in pandas.DataFrame, assign the new column to the original column. The same applies to other methods.

df = pd.DataFrame([[' a-a-x-1 ', ' a-a-x-2 '],
                   [' b-x-b-1 ', ' b-x-b-2 '],
                   [' x-c-c-1 ', ' x-c-c-2 ']],
                  columns=['col1', 'col2'])
print(df)
#         col1       col2
# 0   a-a-x-1    a-a-x-2 
# 1   b-x-b-1    b-x-b-2 
# 2   x-c-c-1    x-c-c-2 

df['col1'] = df['col1'].str.replace('x', 'z')
print(df)
#         col1       col2
# 0   a-a-z-1    a-a-x-2 
# 1   b-z-b-1    b-x-b-2 
# 2   z-c-c-1    x-c-c-2

source: pandas_str_replace_strip_etc.py

If you want to replace not a substring but the element itself, use the replace() method of pandas.DataFrame or pandas.Series.

pandas: Replace values in DataFrame and Series with replace()

Strip each string in `pandas.Series`

`str.strip()`

By default, whitespace characters at the left and right ends (= leading and trailing whitespace characters) are removed.

s_new = s.str.strip()
print(s_new)
# 0    a-a-x
# 1    b-x-b
# 2    x-c-c
# dtype: object

source: pandas_str_replace_strip_etc.py

You can specify characters to be removed. Characters in the specified string are removed. The same applies to str.lstrip() and str.rstrip().

s_new = s.str.strip(' x')
print(s_new)
# 0     a-a-
# 1    b-x-b
# 2     -c-c
# dtype: object

source: pandas_str_replace_strip_etc.py

For pandas.DataFrame:

df['col1'] = df['col1'].str.strip()
print(df)
#       col1       col2
# 0  a-a-z-1   a-a-x-2 
# 1  b-z-b-1   b-x-b-2 
# 2  z-c-c-1   x-c-c-2

source: pandas_str_replace_strip_etc.py

`str.lstrip()`

str.lstrip() strips only the characters on the left side.

s_new = s.str.lstrip()
print(s_new)
# 0    a-a-x 
# 1    b-x-b 
# 2    x-c-c 
# dtype: object

source: pandas_str_replace_strip_etc.py

`str.rstrip()`

str.rstrip() strips only the characters on the right side.

s_new = s.str.rstrip()
print(s_new)
# 0     a-a-x
# 1     b-x-b
# 2     x-c-c
# dtype: object

source: pandas_str_replace_strip_etc.py

Convert the case of each string in `pandas.Series`

The following pandas.DataFrame is used as an example.

s = pd.Series(['Hello World', 'hello world', 'HELLO WORLD'])
print(s)
# 0    Hello World
# 1    hello world
# 2    HELLO WORLD
# dtype: object

source: pandas_str_replace_strip_etc.py

`str.lower()`

s_new = s.str.lower()
print(s_new)
# 0    hello world
# 1    hello world
# 2    hello world
# dtype: object

source: pandas_str_replace_strip_etc.py

`str.upper()`

s_new = s.str.upper()
print(s_new)
# 0    HELLO WORLD
# 1    HELLO WORLD
# 2    HELLO WORLD
# dtype: object

source: pandas_str_replace_strip_etc.py

`str.capitalize()`

s_new = s.str.capitalize()
print(s_new)
# 0    Hello world
# 1    Hello world
# 2    Hello world
# dtype: object

source: pandas_str_replace_strip_etc.py

`str.title()`

s_new = s.str.title()
print(s_new)
# 0    Hello World
# 1    Hello World
# 2    Hello World
# dtype: object

source: pandas_str_replace_strip_etc.py

pandas: Handle strings (replace, strip, case conversion, etc.)

Replace each string in `pandas.Series`

`str.replace()`

Strip each string in `pandas.Series`

`str.strip()`

`str.lstrip()`

`str.rstrip()`

Convert the case of each string in `pandas.Series`

`str.lower()`

`str.upper()`

`str.capitalize()`

`str.title()`

Related Categories

Related Articles

pandas: Handle strings (replace, strip, case conversion, etc.)

Replace each string in pandas.Series

str.replace()

Strip each string in pandas.Series

str.strip()

str.lstrip()

str.rstrip()

Convert the case of each string in pandas.Series

str.lower()

str.upper()

str.capitalize()

str.title()

Related Categories

Related Articles

Replace each string in `pandas.Series`

`str.replace()`

Strip each string in `pandas.Series`

`str.strip()`

`str.lstrip()`

`str.rstrip()`

Convert the case of each string in `pandas.Series`

`str.lower()`

`str.upper()`

`str.capitalize()`

`str.title()`