note.nkmk.me

Method chains with line breaks in Python

Posted: 2021-10-03 / Tags: Python, pandas, NumPy, Pillow

Some Python libraries, such as pandas, NumPy, and Pillow (PIL), are designed so that methods can be chained together and processed in order (= method chaining).

Method chaining is not a special syntax, as it is just repeating the process of calling a method directly from the return value.

The number of characters per line tends to be long when using method chains, but you can use parentheses to break lines as needed.

First, this article describes the following basics using pandas as an example.

  • Method chaining in pandas
  • Line breaks within parentheses
  • Use backslashes
  • Enclose in parentheses to break lines

Next, this article presents examples of NumPy and Pillow (PIL).

  • Method chaining in NumPy
  • Method chaining in Pillow(PIL)

PEP8, the Python style guide (coding conventions), includes a section on indentation, but does not specifically mention method chaining.

The following sample codes are confirmed not to raises any warnings by the flake8 coding check.

Sponsored Link

Method chaining in pandas

Many methods of pandas.DataFrame and pandas.Series return pandas.DataFrame and pandas.Series, and the methods can be chained.

If you don't use a method chain, you can write, for example, the following:

Read the file with read_csv().

import pandas as pd

df = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0)

print(df)
#          age state  point
# name                     
# Alice     24    NY     64
# Bob       42    CA     92
# Charlie   18    CA     70
# Dave      68    TX     70
# Ellen     24    CA     88
# Frank     30    NY     57

Add new columns to this pandas.DataFrame, delete unnecessary columns, sort, and extract only the first three rows.

df = df.assign(point_ratio=df['point'] / 100)
df = df.drop(columns='state')
df = df.sort_values('age')
df = df.head(3)

print(df)
#          age  point  point_ratio
# name                            
# Charlie   18     70         0.70
# Alice     24     64         0.64
# Ellen     24     88         0.88

The same process can be written by connecting methods as follows.

df_mc = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0).assign(point_ratio=df['point'] / 100).drop(columns='state').sort_values('age').head(3)

print(df_mc)
#          age  point  point_ratio
# name                            
# Charlie   18     70         0.70
# Alice     24     64         0.64
# Ellen     24     88         0.88

Although method chaining is convenient and simple to write, it can lead to unexpected results if you connect a large number of methods that you do not understand well. If you are not familiar with them, it may be safer to apply methods one by one and check the results.

There are also some disadvantages, such as the lack of completion for the second and subsequent methods in some editors.

Line breaks within parentheses

In Python, you can freely break lines inside parentheses, so you can write like the following.

df_mc_break = pd.read_csv(
    'data/src/sample_pandas_normal.csv',
    index_col=0
).assign(
    point_ratio=df['point'] / 100
).drop(
    columns='state'
).sort_values(
    'age'
).head(
    3
)

Note that even if you can freely use line breaks, an error raises if you break a line in a string literal.

# df_mc_break = pd.read_csv(
#     'data/src/sample_
#     pandas_normal.csv',
#     index_col=0
# ).assign(
#     point_ratio=df['point'] / 100
# ).drop(
#     columns='state'
# ).sort_values(
#     'age'
# ).head(
#     3
# )
# SyntaxError: EOL while scanning string literal

Of course, you can break lines only where there are many characters.

dfdf_mc_break_mc = pd.read_csv(
    'data/src/sample_pandas_normal.csv', index_col=0
).assign(
    point_ratio=df['point'] / 100
).drop(columns='state').sort_values('age').head(3)

Use backslashes

In Python, the backslash (\) is a continuation character, and when placed at the end of a line it ignores subsequent line breaks and considers the line to be continuous.

Using this, you can write the following.

df_mc_break_backslash = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0) \
                          .assign(point_ratio=df['point'] / 100) \
                          .drop(columns='state') \
                          .sort_values('age') \
                          .head(3)
Sponsored Link

Enclose in parentheses to break lines

You can also use the rule that you can freely break lines within parentheses, and enclose the whole code in parentheses ().

df_mc_break_parens = (
    pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0)
    .assign(point_ratio=df['point'] / 100)
    .drop(columns='state')
    .sort_values('age')
    .head(3)
)

In this case, too, you are free to use line breaks or not, so you can write the following.

df_mc_break_parens = (pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0)
                      .assign(point_ratio=df['point'] / 100)
                      .drop(columns='state')
                      .sort_values('age')
                      .head(3))

Putting a dot (. ) at the end of a line dose not cause an error. However, in this case, it may be difficult to see that it is a method chain, so you should avoid it.

df_mc_break_parens = (
    pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0).
    assign(point_ratio=df['point'] / 100).
    drop(columns='state').
    sort_values('age').
    head(3)
)

Similarly, you can write a long string using parentheses to break lines in the code. See the following article.

Method chaining in NumPy

There are several methods of NumPy array ndarray that return ndarray.

Example without method chaining:

import numpy as np

a = np.arange(12)
a = a.reshape(3, 4)
a = a.clip(2, 9)

print(a)
# [[2 2 2 3]
#  [4 5 6 7]
#  [8 9 9 9]]

Example with method chaining:

a_mc = np.arange(12).reshape(3, 4).clip(2, 9)

print(a_mc)
# [[2 2 2 3]
#  [4 5 6 7]
#  [8 9 9 9]]

To break a line by enclosing it in parentheses.

a_mc_break_parens = (
    np.arange(12)
    .reshape(3, 4)
    .clip(2, 9)
)

print(a_mc_break_parens)
# [[2 2 2 3]
#  [4 5 6 7]
#  [8 9 9 9]]

Note that in NumPy, many operations are defined as functions with ndarray as an argument, rather than as methods of ndarray, so you can't do everything with a method chain as you can with pandas.

Method chaining in Pillow(PIL)

In the image processing library Pillow(PIL), images are represented by the Image type. Some methods of Image also return the processed Image.

Example without method chaining:

An image file is loaded, various processes are performed, and finally it is saved as another file.

from PIL import Image, ImageFilter

im = Image.open('data/src/lena_square.png')
im = im.convert('L')
im = im.rotate(90)
im = im.filter(ImageFilter.GaussianBlur())
im.save('data/temp/lena_square_pillow.jpg', quality=95)

Example with method chaining:

Image.open('data/src/lena_square.png').convert('L').rotate(90).filter(ImageFilter.GaussianBlur()).save('data/temp/lena_square_pillow.jpg', quality=95)

To break a line by enclosing it in parentheses.

This example may look a little strange because if you do everything from loading to saving at once, you can complete the process without assigning the return value to a variable.

(
    Image.open('data/src/lena_square.png')
    .convert('L')
    .rotate(90)
    .filter(ImageFilter.GaussianBlur())
    .save('data/temp/lena_square_pillow.jpg', quality=95)
)
Sponsored Link
Share

Related Categories

Related Articles