Method Chaining Across Multiple Lines in Python
Some Python libraries, like pandas, NumPy, and Pillow (PIL), are designed to enable method chaining, where methods can be linked sequentially.
Although method chaining tends to increase the length of a line of code, using parentheses allows for appropriate line breaks.
This article will first explain how to write method chaining with line breaks using pandas as an example, and then introduce examples using NumPy and Pillow (PIL).
Note that the Python style guide, PEP8, includes sections on indentation but does not specify guidelines for method chaining.
The stricter code formatter Black recommends using parentheses for multi-line method chaining, as mentioned later.
Method chaining in pandas
In pandas, many methods of DataFrame
or Series
return a DataFrame
or Series
, allowing method chaining.
First, consider the following example without method chaining.
Read a CSV file with pd.read_csv()
.
import pandas as pd
df = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0)
print(df)
# age state point
# name
# Alice 24 NY 64
# Bob 42 CA 92
# Charlie 18 CA 70
# Dave 68 TX 70
# Ellen 24 CA 88
# Frank 30 NY 57
To this DataFrame
, add a new column, remove unnecessary columns, sort the data, and extract only the first three rows.
- pandas: Add rows/columns to DataFrame with assign(), insert()
- pandas: Delete rows/columns from DataFrame with drop()
- pandas: Sort DataFrame/Series with sort_values(), sort_index()
- pandas: Get first/last n rows of DataFrame with head() and tail()
df = df.assign(point_ratio=df['point'] / 100)
df = df.drop(columns='state')
df = df.sort_values('age')
df = df.head(3)
print(df)
# age point point_ratio
# name
# Charlie 18 70 0.70
# Alice 24 64 0.64
# Ellen 24 88 0.88
The same process can be written with method chaining as follows. It is intentionally written in a single line.
df_mc = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0).assign(point_ratio=df['point'] / 100).drop(columns='state').sort_values('age').head(3)
print(df_mc)
# age point point_ratio
# name
# Charlie 18 70 0.70
# Alice 24 64 0.64
# Ellen 24 88 0.88
While method chaining is convenient and straightforward, chaining many methods without a good understanding of them can lead to unexpected results. If you are unfamiliar with the methods, it might be safer to apply them one at a time and check the results.
Additionally, depending on the editor, there could be disadvantages like autocomplete not working for the second and subsequent methods.
Breaking lines inside parentheses
Python allows line breaks inside parentheses, as shown below. The results are the same in all cases, so they are omitted.
df_mc_break = pd.read_csv(
'data/src/sample_pandas_normal.csv',
index_col=0
).assign(
point_ratio=df['point'] / 100
).drop(
columns='state'
).sort_values(
'age'
).head(
3
)
However, it is important to note that breaking lines inside string literals can cause errors.
# df_mc_break = pd.read_csv(
# 'data/src/sample_
# pandas_normal.csv',
# index_col=0
# ).assign(
# point_ratio=df['point'] / 100
# ).drop(
# columns='state'
# ).sort_values(
# 'age'
# ).head(
# 3
# )
# SyntaxError: unterminated string literal (detected at line 2)
Of course, there is no need to break lines everywhere; you may choose to do so only in longer segments if preferred.
df_mc_break = pd.read_csv(
'data/src/sample_pandas_normal.csv', index_col=0
).assign(
point_ratio=df['point'] / 100
).drop(columns='state').sort_values('age').head(3)
Use backslashes
In Python, a backslash (\
) acts as a continuation character, allowing a line to continue beyond a line break.
df_mc_backslash = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0) \
.assign(point_ratio=df['point'] / 100) \
.drop(columns='state') \
.sort_values('age') \
.head(3)
Enclose the method chain in parentheses ()
Leveraging Python's rule that allows for free line breaks within parentheses, enclosing the entire method chain in ()
allows flexible breaking.
df_mc_parens = (
pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0)
.assign(point_ratio=df['point'] / 100)
.drop(columns='state')
.sort_values('age')
.head(3)
)
In this case, you are also free to use line breaks or not, so you can write as follows.
df_mc_parens = (pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0)
.assign(point_ratio=df['point'] / 100)
.drop(columns='state')
.sort_values('age')
.head(3))
Placing a dot (.
) at the end of a line is not erroneous but may obscure the presence of method chaining. It is best to avoid this practice.
df_mc_parens = (
pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0).
assign(point_ratio=df['point'] / 100).
drop(columns='state').
sort_values('age').
head(3)
)
Similarly, you can write a long string using parentheses to break lines in the code. See the following article.
Code formatter Black recommends using parentheses
While the Python style guide, PEP8, does not recommend specific guidelines for method chaining, the stricter code formatter Black recommends using parentheses and breaking lines before dots.
- psf/black: The uncompromising Python code formatter
- The Black code style - Call chains - Black 23.9.1 documentation
df_mc_parens = (
pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0)
.assign(point_ratio=df['point'] / 100)
.drop(columns='state')
.sort_values('age')
.head(3)
)
Method chaining in NumPy
In NumPy, several methods of ndarray
return ndarray
.
Example without method chaining:
- NumPy: arange() and linspace() to generate evenly spaced values
- NumPy: reshape() to change the shape of an array
- NumPy: clip() to limit array values to min and max
import numpy as np
a = np.arange(12)
a = a.reshape(3, 4)
a = a.clip(2, 9)
print(a)
# [[2 2 2 3]
# [4 5 6 7]
# [8 9 9 9]]
Example with method chaining:
a_mc = np.arange(12).reshape(3, 4).clip(2, 9)
print(a_mc)
# [[2 2 2 3]
# [4 5 6 7]
# [8 9 9 9]]
Wrap in parentheses and break lines:
a_mc_parens = (
np.arange(12)
.reshape(3, 4)
.clip(2, 9)
)
print(a_mc_parens)
# [[2 2 2 3]
# [4 5 6 7]
# [8 9 9 9]]
Note that in NumPy, many operations are defined as functions that take an ndarray
as an argument, rather than methods of ndarray
itself. Thus, not everything can be done through method chaining like in pandas.
Method chaining in Pillow (PIL)
The image processing library Pillow (PIL) represents images with the Image
type. Some methods of Image
return an Image
, allowing method chaining.
Example without method chaining:
from PIL import Image, ImageFilter
im = Image.open('data/src/lena_square.png')
im = im.convert('L')
im = im.rotate(90)
im = im.filter(ImageFilter.GaussianBlur())
im.save('data/temp/lena_square_pillow.jpg', quality=95)
Example with method chaining:
Image.open('data/src/lena_square.png').convert('L').rotate(90).filter(ImageFilter.GaussianBlur()).save('data/temp/lena_square_pillow.jpg', quality=95)
Wrap in parentheses and break lines:
(
Image.open('data/src/lena_square.png')
.convert('L')
.rotate(90)
.filter(ImageFilter.GaussianBlur())
.save('data/temp/lena_square_pillow.jpg', quality=95)
)
As this example shows, performing operations from reading to saving in a single chain without assigning return values to variables can look a bit odd when the entire thing is wrapped in parentheses.
For example, the code formatter Black suggests not using parentheses in such cases, proposing the following style instead.
Image.open('data/src/lena_square.png').convert('L').rotate(90).filter(
ImageFilter.GaussianBlur()
).save('data/temp/lena_square_pillow.jpg', quality=95)