pandas: Slice substrings from each element in columns

Posted: | Tags: Python, pandas

You can apply Python string (str) methods on the pandas.DataFrame column (= pandas.Series) with .str (str accessor).

This article describes how to slice substrings of any length from any position to generate a new column.

  • Slice substrings from each element in pandas.Series
    • Extract a head of a string
    • Extract a tail of a string
    • Specify step
    • Extract a single character with index
    • Add as a new column to pandas.DataFrame
  • Convert numeric values to strings and slice

See the following article for basic usage of slices in Python.

See the following article on how to apply a slice to a pandas.DataFrame to select rows and columns.

The following pandas.DataFrame is used as an example.

import pandas as pd

df = pd.DataFrame({'a': ['abcde', 'fghij', 'klmno'],
                   'b': [123, 456, 789]})

print(df)
#        a    b
# 0  abcde  123
# 1  fghij  456
# 2  klmno  789

print(df.dtypes)
# a    object
# b     int64
# dtype: object

Slice substrings from each element in pandas.Series

You can slice with .str[] for columns of str.

Extract a head of a string

print(df['a'].str[:2])
# 0    ab
# 1    fg
# 2    kl
# Name: a, dtype: object

Extract a tail of a string

You may specify the position from the end with a negative value.

print(df['a'].str[-2:])
# 0    de
# 1    ij
# 2    no
# Name: a, dtype: object

Specify step

You can specify step like start:stop:step.

print(df['a'].str[::2])
# 0    ace
# 1    fhj
# 2    kmo
# Name: a, dtype: object

Extract a single character with index

In addition to slicing, a single character can be extracted by index.

print(df['a'].str[2])
# 0    c
# 1    h
# 2    m
# Name: a, dtype: object

print(df['a'].str[0])
# 0    a
# 1    f
# 2    k
# Name: a, dtype: object

print(df['a'].str[-1])
# 0    e
# 1    j
# 2    o
# Name: a, dtype: object

Add as a new column to pandas.DataFrame

You can add the extracted column as a new column to pandas.DataFrame.

df['a_head'] = df['a'].str[:2]
print(df)
#        a    b a_head
# 0  abcde  123     ab
# 1  fghij  456     fg
# 2  klmno  789     kl

Convert numeric values to strings and slice

Using the str accessor (.str) for a non-string column raises an error AttributeError.

# print(df['b'].str[:2])
# AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

You can use the astype() method to convert it to the string str.

print(df['b'].astype(str).str[:2])
# 0    12
# 1    45
# 2    78
# Name: b, dtype: object

If you want to treat it as a number, apply astype() again.

print(df['b'].astype(str).str[:2].astype(int))
# 0    12
# 1    45
# 2    78
# Name: b, dtype: int64

Related Categories

Related Articles