Method Chaining Across Multiple Lines in Python

Modified: | Tags: Python, pandas, NumPy, Pillow

Some Python libraries, like pandas, NumPy, and Pillow (PIL), are designed to enable method chaining, where methods can be linked sequentially.

Although method chaining tends to increase the length of a line of code, using parentheses allows for appropriate line breaks.

This article will first explain how to write method chaining with line breaks using pandas as an example, and then introduce examples using NumPy and Pillow (PIL).

Note that the Python style guide, PEP8, includes sections on indentation but does not specify guidelines for method chaining.

The stricter code formatter Black recommends using parentheses for multi-line method chaining, as mentioned later.

Method chaining in pandas

In pandas, many methods of DataFrame or Series return a DataFrame or Series, allowing method chaining.

First, consider the following example without method chaining.

Read a CSV file with pd.read_csv().

import pandas as pd

df = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0)

print(df)
#          age state  point
# name                     
# Alice     24    NY     64
# Bob       42    CA     92
# Charlie   18    CA     70
# Dave      68    TX     70
# Ellen     24    CA     88
# Frank     30    NY     57

To this DataFrame, add a new column, remove unnecessary columns, sort the data, and extract only the first three rows.

df = df.assign(point_ratio=df['point'] / 100)
df = df.drop(columns='state')
df = df.sort_values('age')
df = df.head(3)

print(df)
#          age  point  point_ratio
# name                            
# Charlie   18     70         0.70
# Alice     24     64         0.64
# Ellen     24     88         0.88

The same process can be written with method chaining as follows. It is intentionally written in a single line.

df_mc = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0).assign(point_ratio=df['point'] / 100).drop(columns='state').sort_values('age').head(3)

print(df_mc)
#          age  point  point_ratio
# name                            
# Charlie   18     70         0.70
# Alice     24     64         0.64
# Ellen     24     88         0.88

While method chaining is convenient and straightforward, chaining many methods without a good understanding of them can lead to unexpected results. If you are unfamiliar with the methods, it might be safer to apply them one at a time and check the results.

Additionally, depending on the editor, there could be disadvantages like autocomplete not working for the second and subsequent methods.

Breaking lines inside parentheses

Python allows line breaks inside parentheses, as shown below. The results are the same in all cases, so they are omitted.

df_mc_break = pd.read_csv(
    'data/src/sample_pandas_normal.csv',
    index_col=0
).assign(
    point_ratio=df['point'] / 100
).drop(
    columns='state'
).sort_values(
    'age'
).head(
    3
)

However, it is important to note that breaking lines inside string literals can cause errors.

# df_mc_break = pd.read_csv(
#     'data/src/sample_
#     pandas_normal.csv',
#     index_col=0
# ).assign(
#     point_ratio=df['point'] / 100
# ).drop(
#     columns='state'
# ).sort_values(
#     'age'
# ).head(
#     3
# )
# SyntaxError: unterminated string literal (detected at line 2)

Of course, there is no need to break lines everywhere; you may choose to do so only in longer segments if preferred.

df_mc_break = pd.read_csv(
    'data/src/sample_pandas_normal.csv', index_col=0
).assign(
    point_ratio=df['point'] / 100
).drop(columns='state').sort_values('age').head(3)

Use backslashes

In Python, a backslash (\) acts as a continuation character, allowing a line to continue beyond a line break.

df_mc_backslash = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0) \
                    .assign(point_ratio=df['point'] / 100) \
                    .drop(columns='state') \
                    .sort_values('age') \
                    .head(3)

Enclose the method chain in parentheses ()

Leveraging Python's rule that allows for free line breaks within parentheses, enclosing the entire method chain in () allows flexible breaking.

df_mc_parens = (
    pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0)
    .assign(point_ratio=df['point'] / 100)
    .drop(columns='state')
    .sort_values('age')
    .head(3)
)

In this case, you are also free to use line breaks or not, so you can write as follows.

df_mc_parens = (pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0)
                .assign(point_ratio=df['point'] / 100)
                .drop(columns='state')
                .sort_values('age')
                .head(3))

Placing a dot (.) at the end of a line is not erroneous but may obscure the presence of method chaining. It is best to avoid this practice.

df_mc_parens = (
    pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0).
    assign(point_ratio=df['point'] / 100).
    drop(columns='state').
    sort_values('age').
    head(3)
)

Similarly, you can write a long string using parentheses to break lines in the code. See the following article.

Code formatter Black recommends using parentheses

While the Python style guide, PEP8, does not recommend specific guidelines for method chaining, the stricter code formatter Black recommends using parentheses and breaking lines before dots.

df_mc_parens = (
    pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0)
    .assign(point_ratio=df['point'] / 100)
    .drop(columns='state')
    .sort_values('age')
    .head(3)
)

Method chaining in NumPy

In NumPy, several methods of ndarray return ndarray.

Example without method chaining:

import numpy as np

a = np.arange(12)
a = a.reshape(3, 4)
a = a.clip(2, 9)

print(a)
# [[2 2 2 3]
#  [4 5 6 7]
#  [8 9 9 9]]

Example with method chaining:

a_mc = np.arange(12).reshape(3, 4).clip(2, 9)

print(a_mc)
# [[2 2 2 3]
#  [4 5 6 7]
#  [8 9 9 9]]

Wrap in parentheses and break lines:

a_mc_parens = (
    np.arange(12)
    .reshape(3, 4)
    .clip(2, 9)
)

print(a_mc_parens)
# [[2 2 2 3]
#  [4 5 6 7]
#  [8 9 9 9]]

Note that in NumPy, many operations are defined as functions that take an ndarray as an argument, rather than methods of ndarray itself. Thus, not everything can be done through method chaining like in pandas.

Method chaining in Pillow (PIL)

The image processing library Pillow (PIL) represents images with the Image type. Some methods of Image return an Image, allowing method chaining.

Example without method chaining:

from PIL import Image, ImageFilter

im = Image.open('data/src/lena_square.png')
im = im.convert('L')
im = im.rotate(90)
im = im.filter(ImageFilter.GaussianBlur())
im.save('data/temp/lena_square_pillow.jpg', quality=95)

Example with method chaining:

Image.open('data/src/lena_square.png').convert('L').rotate(90).filter(ImageFilter.GaussianBlur()).save('data/temp/lena_square_pillow.jpg', quality=95)

Wrap in parentheses and break lines:

(
    Image.open('data/src/lena_square.png')
    .convert('L')
    .rotate(90)
    .filter(ImageFilter.GaussianBlur())
    .save('data/temp/lena_square_pillow.jpg', quality=95)
)

As this example shows, performing operations from reading to saving in a single chain without assigning return values to variables can look a bit odd when the entire thing is wrapped in parentheses.

For example, the code formatter Black suggests not using parentheses in such cases, proposing the following style instead.

Image.open('data/src/lena_square.png').convert('L').rotate(90).filter(
    ImageFilter.GaussianBlur()
).save('data/temp/lena_square_pillow.jpg', quality=95)

Related Categories

Related Articles