pandas: Handle strings (replace, strip, case conversion, etc.)
You can use various methods with the string accessor (str.xxx()
) to handle (replace, strip, etc.) strings of pandas.Series
(= a column or row of pandas.DataFrame
).
For example, the following methods are available. You can apply the same methods for standard Python strings (str
) to all elements of pandas.Series
.
- Replace each string in
pandas.Series
str.replace()
- Strip each string in
pandas.Series
str.strip()
str.lstrip()
str.rstrip()
- Convert the case of each string in
pandas.Series
str.lower()
str.upper()
str.capitalize()
str.title()
See also the following articles for other string methods.
- pandas: Extract rows that contain specific strings from a DataFrame
- pandas: Split string columns by delimiters or regular expressions
- pandas: Slice substrings from each element in columns
Replace each string in pandas.Series
str.replace()
import pandas as pd
s = pd.Series([' a-a-x ', ' b-x-b ', ' x-c-c '])
print(s)
# 0 a-a-x
# 1 b-x-b
# 2 x-c-c
# dtype: object
s_new = s.str.replace('x', 'z')
print(s_new)
# 0 a-a-z
# 1 b-z-b
# 2 z-c-c
# dtype: object
To update a column in pandas.DataFrame
, assign the new column to the original column. The same applies to other methods.
df = pd.DataFrame([[' a-a-x-1 ', ' a-a-x-2 '],
[' b-x-b-1 ', ' b-x-b-2 '],
[' x-c-c-1 ', ' x-c-c-2 ']],
columns=['col1', 'col2'])
print(df)
# col1 col2
# 0 a-a-x-1 a-a-x-2
# 1 b-x-b-1 b-x-b-2
# 2 x-c-c-1 x-c-c-2
df['col1'] = df['col1'].str.replace('x', 'z')
print(df)
# col1 col2
# 0 a-a-z-1 a-a-x-2
# 1 b-z-b-1 b-x-b-2
# 2 z-c-c-1 x-c-c-2
If you want to replace not a substring but the element itself, use the replace()
method of pandas.DataFrame
or pandas.Series
.
Strip each string in pandas.Series
str.strip()
By default, whitespace characters at the left and right ends (= leading and trailing whitespace characters) are removed.
s_new = s.str.strip()
print(s_new)
# 0 a-a-x
# 1 b-x-b
# 2 x-c-c
# dtype: object
You can specify characters to be removed. Characters in the specified string are removed. The same applies to str.lstrip()
and str.rstrip()
.
s_new = s.str.strip(' x')
print(s_new)
# 0 a-a-
# 1 b-x-b
# 2 -c-c
# dtype: object
For pandas.DataFrame
:
df['col1'] = df['col1'].str.strip()
print(df)
# col1 col2
# 0 a-a-z-1 a-a-x-2
# 1 b-z-b-1 b-x-b-2
# 2 z-c-c-1 x-c-c-2
str.lstrip()
str.lstrip()
strips only the characters on the left side.
s_new = s.str.lstrip()
print(s_new)
# 0 a-a-x
# 1 b-x-b
# 2 x-c-c
# dtype: object
str.rstrip()
str.rstrip()
strips only the characters on the right side.
s_new = s.str.rstrip()
print(s_new)
# 0 a-a-x
# 1 b-x-b
# 2 x-c-c
# dtype: object
Convert the case of each string in pandas.Series
The following pandas.DataFrame
is used as an example.
s = pd.Series(['Hello World', 'hello world', 'HELLO WORLD'])
print(s)
# 0 Hello World
# 1 hello world
# 2 HELLO WORLD
# dtype: object
str.lower()
s_new = s.str.lower()
print(s_new)
# 0 hello world
# 1 hello world
# 2 hello world
# dtype: object
str.upper()
s_new = s.str.upper()
print(s_new)
# 0 HELLO WORLD
# 1 HELLO WORLD
# 2 HELLO WORLD
# dtype: object
str.capitalize()
s_new = s.str.capitalize()
print(s_new)
# 0 Hello world
# 1 Hello world
# 2 Hello world
# dtype: object
str.title()
s_new = s.str.title()
print(s_new)
# 0 Hello World
# 1 Hello World
# 2 Hello World
# dtype: object