pandas: Iterate DataFrame with for loop (iterrows, itertuples, items)

Modified: 2024-01-27 | Tags: Python, pandas

This article explains how to iterate over a pandas.DataFrame with a for loop.

When you simply iterate over a DataFrame, it returns the column names; however, you can iterate over its columns or rows using methods like items() (formerly iteritems()), iterrows(), and itertuples().

Essential basic functionality - Iteration — pandas 2.1.4 documentation

The latter part of this article also discusses approaches for processing a DataFrame without a for loop.

Contents

Iterate over a DataFrame
Iterate over columns of a DataFrame: items()（formerly iteritems()）
Iterate over rows of a DataFrame: iterrows(), itertuples()
- iterrows()
- itertuples()
Iterate over a specific column (= Series) of a DataFrame
Update values within a for loop
Process a DataFrame without a for loop
Processing speed comparison

For more details on for loops in Python, see the following article.

Python for loop (with range, enumerate, zip, and more)

The pandas version used in this article is as follows. Note that functionality may vary between versions. The following DataFrame is used as an example.

import pandas as pd

print(pd.__version__)
# 2.1.4

df = pd.DataFrame({'age': [24, 42], 'state': ['NY', 'CA'], 'point': [64, 92]},
                  index=['Alice', 'Bob'])
print(df)
#        age state  point
# Alice   24    NY     64
# Bob     42    CA     92

source: pandas_for_iteration.py

Iterate over a `DataFrame`

Iterating directly over a DataFrame with a for loop extracts the column names sequentially.

for column_name in df:
    print(column_name)
# age
# state
# point

source: pandas_for_iteration.py

Iterate over columns of a `DataFrame`: `items()`（formerly `iteritems()`）

The items() method iterates over the columns of a DataFrame as (column_name, Series) pairs.

pandas.DataFrame.items — pandas 2.1.4 documentation

You can extract each value by specifying the label in the Series.

for column_name, item in df.items():
    print(column_name)
    print(type(item))
    print(item['Alice'], item['Bob'])
    print('======')
# age
# <class 'pandas.core.series.Series'>
# 24 42
# ======
# state
# <class 'pandas.core.series.Series'>
# NY CA
# ======
# point
# <class 'pandas.core.series.Series'>
# 64 92
# ======

source: pandas_for_iteration.py

Note that this method was previously named iteritems(), but it was changed to items(). iteritems() was removed in pandas version 2.0.

Iterate over rows of a `DataFrame`: `iterrows()`, `itertuples()`

You can use the iterrows() and itertuples() methods to iterate over rows of a DataFrame. itertuples() is faster than iterrows().

If you only need the values of a specific column, it is even faster to iterate over that column individually, as described next. The results of an experiment on processing speed are shown at the end.

`iterrows()`

The iterrows() method iterates over the rows of a DataFrame as (index, Series) pairs.

pandas.DataFrame.iterrows — pandas 2.1.4 documentation

for index, row in df.iterrows():
    print(index)
    print(type(row))
    print(row['age'], row['state'], row['point'])
    print('======')
# Alice
# <class 'pandas.core.series.Series'>
# 24 NY 64
# ======
# Bob
# <class 'pandas.core.series.Series'>
# 42 CA 92
# ======

source: pandas_for_iteration.py

`itertuples()`

The itertuples() method iterates over the rows of a DataFrame, returning each as a namedtuple.

pandas.DataFrame.itertuples — pandas 2.1.4 documentation

By default, it returns a namedtuple named Pandas, with the first element representing the index (row name). You can access each value with both [] and ..

collections.namedtuple() — Container datatypes — Python 3.12.1 documentation

for row in df.itertuples():
    print(type(row))
    print(row)
    print(row[0], row[1], row[2], row[3])
    print(row.Index, row.age, row.state, row.point)
    print('======')
# <class 'pandas.core.frame.Pandas'>
# Pandas(Index='Alice', age=24, state='NY', point=64)
# Alice 24 NY 64
# Alice 24 NY 64
# ======
# <class 'pandas.core.frame.Pandas'>
# Pandas(Index='Bob', age=42, state='CA', point=92)
# Bob 42 CA 92
# Bob 42 CA 92
# ======

source: pandas_for_iteration.py

Setting the index argument to False excludes the index from the namedtuple. You can also specify the name of the namedtuple with the name argument.

for row in df.itertuples(index=False, name='Person'):
    print(type(row))
    print(row)
    print(row[0], row[1], row[2])
    print(row.age, row.state, row.point)
    print('======')
# <class 'pandas.core.frame.Person'>
# Person(age=24, state='NY', point=64)
# 24 NY 64
# 24 NY 64
# ======
# <class 'pandas.core.frame.Person'>
# Person(age=42, state='CA', point=92)
# 42 CA 92
# 42 CA 92
# ======

source: pandas_for_iteration.py

Setting the name argument to None returns a normal tuple.

for row in df.itertuples(name=None):
    print(type(row))
    print(row)
    print(row[0], row[1], row[2], row[3])
    print('======')
# <class 'tuple'>
# ('Alice', 24, 'NY', 64)
# Alice 24 NY 64
# ======
# <class 'tuple'>
# ('Bob', 42, 'CA', 92)
# Bob 42 CA 92
# ======

source: pandas_for_iteration.py

Iterate over a specific column (= `Series`) of a `DataFrame`

Although the iterrows() and itertuples() methods yield all values of each row, if you only need values of a specific column, you can iterate over it.

A column in a DataFrame is a Series.

print(df['age'])
# Alice    24
# Bob      42
# Name: age, dtype: int64

print(type(df['age']))
# <class 'pandas.core.series.Series'>

source: pandas_for_iteration.py

Since iterating over a Series yields its values, you can sequentially retrieve the values of the DataFrame column by using a for loop.

for age in df['age']:
    print(age)
# 24
# 42

source: pandas_for_iteration.py

The built-in zip() function can be used to retrieve values from multiple columns together.

zip() in Python: Get elements from multiple lists

for age, point in zip(df['age'], df['point']):
    print(age, point)
# 24 64
# 42 92

source: pandas_for_iteration.py

To retrieve the row names, use the index attribute. As with the above example, you can retrieve them together with other columns using zip().

print(df.index)
# Index(['Alice', 'Bob'], dtype='object')

print(type(df.index))
# <class 'pandas.core.indexes.base.Index'>

for index in df.index:
    print(index)
# Alice
# Bob

for index, state in zip(df.index, df['state']):
    print(index, state)
# Alice NY
# Bob CA

source: pandas_for_iteration.py

Update values within a `for` loop

The Series returned by iterrows() may be a copy, not a view, so modifying it may not update the original data.

pandas: Views and copies in DataFrame

print(df)
#        age state  point
# Alice   24    NY     64
# Bob     42    CA     92

for index, row in df.iterrows():
    row['point'] += row['age']

print(df)
#        age state  point
# Alice   24    NY     64
# Bob     42    CA     92

source: pandas_for_iteration.py

You can update values by selecting an element of the original DataFrame with at[].

pandas: Get/Set values with loc, iloc, at, iat

for index, row in df.iterrows():
    df.at[index, 'point'] += row['age']

print(df)
#        age state  point
# Alice   24    NY     88
# Bob     42    CA    134

source: pandas_for_iteration.py

Although the previous example demonstrates the use of at[] for updating values, it is important to note that in many situations, a for loop is unnecessary for such updates. Often, alternative methods are not only simpler but also more efficient. The next section introduces specific examples of these alternatives.

Process a `DataFrame` without a `for` loop

The operation demonstrated in the previous section with a for loop can also be achieved without a for loop as follows.

df = pd.DataFrame({'age': [24, 42], 'state': ['NY', 'CA'], 'point': [64, 92]},
                  index=['Alice', 'Bob'])
print(df)
#        age state  point
# Alice   24    NY     64
# Bob     42    CA     92

df['point'] += df['age']
print(df)
#        age state  point
# Alice   24    NY     88
# Bob     42    CA    134

source: pandas_for_iteration.py

It is also possible to process existing columns and add them as new columns.

pandas: Add rows/columns to DataFrame with assign(), insert()

df['new'] = df['point'] + df['age'] * 2 + 1000
print(df)
#        age state  point   new
# Alice   24    NY     88  1136
# Bob     42    CA    134  1218

source: pandas_for_iteration.py

In addition to arithmetic operations using operators like + and *, you can apply NumPy functions to each element of a column.

import numpy as np

df['age_sqrt'] = np.sqrt(df['age'])
print(df)
#        age state  point   new  age_sqrt
# Alice   24    NY     88  1136  4.898979
# Bob     42    CA    134  1218  6.480741

source: pandas_for_iteration.py

For string processing, pandas offers specific methods to handle columns (Series) directly.

df['state_0'] = df['state'].str.lower().str[0]
print(df)
#        age state  point   new  age_sqrt state_0
# Alice   24    NY     88  1136  4.898979       n
# Bob     42    CA    134  1218  6.480741       c

source: pandas_for_iteration.py

Furthermore, you can apply any function to each element or to each row/column using themap() and apply() methods.

pandas: Apply functions to values, rows, columns with map(), apply()

df['point_hex'] = df['point'].map(hex)
print(df)
#        age state  point   new  age_sqrt state_0 point_hex
# Alice   24    NY     88  1136  4.898979       n      0x58
# Bob     42    CA    134  1218  6.480741       c      0x86

source: pandas_for_iteration.py

Processing speed comparison

This section compares the processing speeds of methods such as iterrows(), itertuples(), and column-specific for loops.

Consider the following DataFrame with 100 rows and 10 columns.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(1000).reshape(100, 10))
print(df.shape)
# (100, 10)

print(df.head(3))
#     0   1   2   3   4   5   6   7   8   9
# 0   0   1   2   3   4   5   6   7   8   9
# 1  10  11  12  13  14  15  16  17  18  19
# 2  20  21  22  23  24  25  26  27  28  29

print(df.tail(3))
#       0    1    2    3    4    5    6    7    8    9
# 97  970  971  972  973  974  975  976  977  978  979
# 98  980  981  982  983  984  985  986  987  988  989
# 99  990  991  992  993  994  995  996  997  998  999

source: pandas_for_iteration_timeit.py

The following code was measured using the %%timeit magic command in Jupyter Notebook. Note that it is not measured if executed as a Python script.

Measure execution time with timeit in Python

%%timeit
for i, row in df.iterrows():
    pass
# 735 µs ± 20.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%%timeit
for t in df.itertuples():
    pass
# 202 µs ± 1.74 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%%timeit
for t in df.itertuples(name=None):
    pass
# 148 µs ± 780 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%%timeit
for i in df[0]:
    pass
# 4.27 µs ± 30.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

%%timeit
for i, j, k in zip(df[0], df[4], df[9]):
    pass
# 13.5 µs ± 53.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

%%timeit
for t in zip(df[0], df[1], df[2], df[3], df[4], df[5], df[6], df[7], df[8], df[9]):
    pass
# 41.3 µs ± 281 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

source: pandas_for_iteration_timeit.py

iterrows() tends to be quite slow, as it converts each row into a Series, whereas itertuples() is faster. Specifying columns for iteration, however, is the fastest method. In our example environment, column-specific iteration proved faster than itertuples(), even when extracting all columns.

While the speed difference may not be significant for datasets with around 100 rows, iterrows() slows significantly with larger datasets. In such cases, it is advisable to use itertuples() or column-specific iteration.

As mentioned earlier, the most efficient approach often involves performing operations without for loops.

pandas: Iterate DataFrame with for loop (iterrows, itertuples, items)

Iterate over a `DataFrame`

Iterate over columns of a `DataFrame`: `items()`（formerly `iteritems()`）

Iterate over rows of a `DataFrame`: `iterrows()`, `itertuples()`

`iterrows()`

`itertuples()`

Iterate over a specific column (= `Series`) of a `DataFrame`

Update values within a `for` loop

Process a `DataFrame` without a `for` loop

Processing speed comparison

Related Categories

Related Articles

pandas: Iterate DataFrame with for loop (iterrows, itertuples, items)

Iterate over a DataFrame

Iterate over columns of a DataFrame: items()（formerly iteritems()）

Iterate over rows of a DataFrame: iterrows(), itertuples()

iterrows()

itertuples()

Iterate over a specific column (= Series) of a DataFrame

Update values within a for loop

Process a DataFrame without a for loop

Processing speed comparison

Related Categories

Related Articles

Iterate over a `DataFrame`

Iterate over columns of a `DataFrame`: `items()`（formerly `iteritems()`）

Iterate over rows of a `DataFrame`: `iterrows()`, `itertuples()`

`iterrows()`

`itertuples()`

Iterate over a specific column (= `Series`) of a `DataFrame`

Update values within a `for` loop

Process a `DataFrame` without a `for` loop