pandas: Get/Set values with loc, iloc, at, iat

Modified: | Tags: Python, pandas

You can use loc, iloc, at, and iat to access data in pandas.DataFrame and get/set values. Use square brackets [] as in loc[], not parentheses () as in loc().

The differences are as follows:

  • How to specify the position
    • at, loc: Row/Column name (label)
    • iat, iloc: Row/Column number
  • Data you can get/set
    • at, iat: Single value
    • loc, iloc: Single or multiple values

You can also select rows and columns of pandas.DataFrame and elements of pandas.Series by indexing [].

Note that the previously provided get_value() and ix[] have been removed in version 1.0.

The sample code in this article is based on pandas version 2.0.3. The following pandas.DataFrame is used as an example.

import pandas as pd

print(pd.__version__)
# 2.0.3

df = pd.DataFrame({'col_0': ['00', '10', '20', '30', '40'],
                   'col_1': ['01', '11', '21', '31', '41'],
                   'col_2': ['02', '12', '22', '32', '42'],
                   'col_3': ['03', '13', '23', '33', '43']},
                  index=['row_0', 'row_1', 'row_2', 'row_3', 'row_4'])
print(df)
#       col_0 col_1 col_2 col_3
# row_0    00    01    02    03
# row_1    10    11    12    13
# row_2    20    21    22    23
# row_3    30    31    32    33
# row_4    40    41    42    43

at, iat: Access and get/set a single value

You can specify the row/column name in at. In addition to getting data, you can also set (assign) a new value.

print(df.at['row_1', 'col_2'])
# 12

df.at['row_1', 'col_2'] = '0'
print(df.at['row_1', 'col_2'])
# 0

You can specify the row/column number (0-based indexing) in iat.

print(df.iat[1, 2])
# 0

df.iat[1, 2] = '12'
print(df.iat[1, 2])
# 12

loc, iloc: Access and get/set single or multiple values

loc and iloc can access both single and multiple values using lists or slices. You can use row/column names for loc and row/column numbers for iloc.

Access a single value

You can access a single value with loc and iloc as well as with at and iat. However, at and iat are faster than loc and iloc.

print(df.loc['row_1', 'col_2'])
# 12

print(df.iloc[1, 2])
# 12

In addition to retrieving data, you can also set a new value for the element.

df.loc['row_1', 'col_2'] = '0'
print(df.loc['row_1', 'col_2'])
# 0

df.iloc[1, 2] = '12'
print(df.iloc[1, 2])
# 12

Access multiple values using lists and slices

With loc and iloc, you can access multiple values by specifying a group of data with a list [a, b, c, ...] and slice start:stop:step.

Note that in the slice notation start:stop:step, the step is optional and can be omitted. For basic usage of slices, see the following article.

When using the slice notation start:stop:step with loc (which uses row/column names), the stop value is inclusive. However, with iloc (which uses row/column numbers), the stop value is exclusive, following the typical behavior of standard Python slices.

When specified by a list, rows and columns follow the order of that list.

print(df.loc['row_1':'row_3', ['col_2', 'col_0']])
#       col_2 col_0
# row_1    12    10
# row_2    22    20
# row_3    32    30

print(df.iloc[1:3, [2, 0]])
#       col_2 col_0
# row_1    12    10
# row_2    22    20

For example, you can extract odd/even rows by specifying step.

print(df.iloc[::2, [0, 3]])
#       col_0 col_3
# row_0    00    03
# row_2    20    23
# row_4    40    43

print(df.iloc[1::2, [0, 3]])
#       col_0 col_3
# row_1    10    13
# row_3    30    33

You can set multiple values simultaneously. If you assign a scalar value, all selected elements will be set to that value. For assigning values to a range, use a two-dimensional list (list of lists) or a two-dimensional NumPy array (ndarray).

df.iloc[1:3, [2, 0]] = '0'
print(df)
#       col_0 col_1 col_2 col_3
# row_0    00    01    02    03
# row_1     0    11     0    13
# row_2     0    21     0    23
# row_3    30    31    32    33
# row_4    40    41    42    43

df.iloc[1:3, [2, 0]] = [['12', '10'], ['22', '20']]
print(df)
#       col_0 col_1 col_2 col_3
# row_0    00    01    02    03
# row_1    10    11    12    13
# row_2    20    21    22    23
# row_3    30    31    32    33
# row_4    40    41    42    43

Note that selecting a row or a column by specifying it as a scalar value returns Series, whereas the same row or column, specified as a slice or a list, returns DataFrame.

In particular, be aware of potential implicit type conversions when retrieving rows as a Series. See below for details.

print(df.loc['row_1', ['col_0', 'col_2']])
print(type(df.loc['row_1', ['col_0', 'col_2']]))
# col_0    10
# col_2    12
# Name: row_1, dtype: object
# <class 'pandas.core.series.Series'>

print(df.loc['row_1':'row_1', ['col_0', 'col_2']])
print(type(df.loc['row_1':'row_1', ['col_0', 'col_2']]))
#       col_0 col_2
# row_1    10    12
# <class 'pandas.core.frame.DataFrame'>

print(df.loc[['row_1'], ['col_0', 'col_2']])
print(type(df.loc[['row_1'], ['col_0', 'col_2']]))
#       col_0 col_2
# row_1    10    12
# <class 'pandas.core.frame.DataFrame'>

Access rows and columns

You can select rows and columns with df[]. They can be specified as:

  • Rows: Slice of row name/number
  • Columns: Column name or list of column names

For more information, see the following article.

print(df['row_1':'row_3'])
#       col_0 col_1 col_2 col_3
# row_1    10    11    12    13
# row_2    20    21    22    23
# row_3    30    31    32    33

print(df[1:3])
#       col_0 col_1 col_2 col_3
# row_1    10    11    12    13
# row_2    20    21    22    23

print(df['col_1'])
# row_0    01
# row_1    11
# row_2    21
# row_3    31
# row_4    41
# Name: col_1, dtype: object

print(df[['col_1', 'col_3']])
#       col_1 col_3
# row_0    01    03
# row_1    11    13
# row_2    21    23
# row_3    31    33
# row_4    41    43

You can specify rows and columns in various ways with loc and iloc.

If you omit specifying columns with loc or iloc, rows are selected. You can specify them by row name/number or list of such names/numbers.

print(df.loc['row_2'])
# col_0    20
# col_1    21
# col_2    22
# col_3    23
# Name: row_2, dtype: object

print(df.iloc[[1, 3]])
#       col_0 col_1 col_2 col_3
# row_1    10    11    12    13
# row_3    30    31    32    33

You can select columns with loc and iloc by specifying rows as :. It is possible to specify by slice.

print(df.loc[:, 'col_1':])
#       col_1 col_2 col_3
# row_0    01    02    03
# row_1    11    12    13
# row_2    21    22    23
# row_3    31    32    33
# row_4    41    42    43

print(df.iloc[:, 2])
# row_0    02
# row_1    12
# row_2    22
# row_3    32
# row_4    42
# Name: col_2, dtype: object

As mentioned above, specifying a single row or column with a scalar value returns a Series, while using a slice or list returns a DataFrame.

Note that selecting a row as pandas.Series may result in implicit type conversion. See below for details.

print(df.loc['row_2'])
print(type(df.loc['row_2']))
# col_0    20
# col_1    21
# col_2    22
# col_3    23
# Name: row_2, dtype: object
# <class 'pandas.core.series.Series'>

print(df.loc['row_2':'row_2'])
print(type(df.loc['row_2':'row_2']))
#       col_0 col_1 col_2 col_3
# row_2    20    21    22    23
# <class 'pandas.core.frame.DataFrame'>

print(df.loc[['row_2']])
print(type(df.loc[['row_2']]))
#       col_0 col_1 col_2 col_3
# row_2    20    21    22    23
# <class 'pandas.core.frame.DataFrame'>

Mask by boolean array and pandas.Series

With loc and iloc, you can use a boolean array or list to filter data. While the following example demonstrates row filtering, the same approach can be applied to columns.

l_bool = [True, False, False, True, False]

print(df.loc[l_bool, ['col_0', 'col_2']])
#       col_0 col_2
# row_0    00    02
# row_3    30    32

print(df.iloc[l_bool, [0, 2]])
#       col_0 col_2
# row_0    00    02
# row_3    30    32

If the number of elements does not match, an error is raised.

l_bool_wrong = [True, False, False]

# print(df.loc[l_bool_wrong, ['col_0', 'col_2']])
# IndexError: Boolean index has wrong length: 3 instead of 5

You can also use a boolean Series with loc for filtering. Note that the filtering is based on matching labels, not on the order of the data.

s_bool = pd.Series([True, False, False, True, False], index=reversed(df.index))
print(s_bool)
# row_4     True
# row_3    False
# row_2    False
# row_1     True
# row_0    False
# dtype: bool

print(df.loc[s_bool, ['col_0', 'col_2']])
#       col_0 col_2
# row_1    10    12
# row_4    40    42

You cannot specify Series in iloc.

# print(df.iloc[s_bool, [0, 2]])
# ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types

Even with loc, an error is raised if the labels do not match.

s_bool_wrong = pd.Series([True, False, False], index=['row_0', 'row_1', 'row_2'])

# print(df.loc[s_bool_wrong, ['col_0', 'col_2']])
# IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

s_bool_wrong = pd.Series([True, False, False, True, False],
                         index=['row_0', 'row_1', 'row_2', 'row_3', 'XXX'])

# print(df.loc[s_bool_wrong, ['col_0', 'col_2']])
# IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

Duplicated row/column names

Both row names (index) and column names (columns) can have duplicates.

Consider the following DataFrame with duplicate row and column names as an example.

df_duplicated = df.rename(columns={'col_2': 'col_1'}, index={'row_3': 'row_2'})
print(df_duplicated)
#       col_0 col_1 col_1 col_3
# row_0    00    01    02    03
# row_1    10    11    12    13
# row_2    20    21    22    23
# row_2    30    31    32    33
# row_4    40    41    42    43

For at and loc, specifying duplicate names selects the corresponding multiple elements.

print(df_duplicated.at['row_2', 'col_1'])
print(type(df_duplicated.at['row_2', 'col_1']))
#       col_1 col_1
# row_2    21    22
# row_2    31    32
# <class 'pandas.core.frame.DataFrame'>

print(df_duplicated.loc[:'row_2', ['col_1', 'col_3']])
print(type(df_duplicated.loc[:'row_2', ['col_1', 'col_3']]))
#       col_1 col_1 col_3
# row_0    01    02    03
# row_1    11    12    13
# row_2    21    22    23
# row_2    31    32    33
# <class 'pandas.core.frame.DataFrame'>

When using iat and iloc to specify by row/column number, duplicated names are not an issue because they operate based on position.

print(df_duplicated.iat[2, 1])
# 21

print(df_duplicated.iloc[:2, [1, 3]])
#       col_1 col_3
# row_0    01    03
# row_1    11    13

To avoid confusion, it's advisable to use unique values for row and column names unless there's a compelling reason otherwise.

You can check whether row and column names are unique (not duplicated) with index.is_unique and columns.is_unique.

print(df_duplicated.index.is_unique)
# False

print(df_duplicated.columns.is_unique)
# False

See the following article on how to rename row and column names.

Specify by number and name

If you want to specify by both number and name, use at or loc in combination with the index or columns attributes.

You can retrieve row or column names based on their number using the index and columns attributes.

print(df.index[2])
# row_2

print(df.columns[2])
# col_2

For index and columns, you can use slices and lists to retrieve multiple names.

print(df.index[1:4])
# Index(['row_1', 'row_2', 'row_3'], dtype='object')

print(df.columns[[1, 3]])
# Index(['col_1', 'col_3'], dtype='object')

Using this and at or loc, you can specify by number and name.

print(df.at[df.index[2], 'col_2'])
# 22

print(df.loc[['row_0', 'row_3'], df.columns[[1, 3]]])
#       col_1 col_3
# row_0    01    03
# row_3    31    33

Using indexing operations in succession, such as df[...][...], df.loc[...].iloc[...], and other similar patterns, is known as "chained indexing". This approach can trigger a SettingWithCopyWarning.

While this approach causes no issues during simple data retrieval and checking, be cautious as assigning new values might yield unexpected results.

print(df['col_2'][2])
# 22

print(df.loc[['row_0', 'row_3']].iloc[:, [1, 3]])
#       col_1 col_3
# row_0    01    03
# row_3    31    33

Implicit type conversion when selecting a row as pandas.Series

If the columns of the original DataFrame have different data types, then when selecting a row as a Series with loc or iloc, the data type of the elements in the selected Series might differ from the data types in the original DataFrame.

Consider a DataFrame with columns of integers (int) and floating point numbers (float).

df_mix = pd.DataFrame({'col_int': [0, 1, 2], 'col_float': [0.1, 0.2, 0.3]}, index=['A', 'B', 'C'])
print(df_mix)
#    col_int  col_float
# A        0        0.1
# B        1        0.2
# C        2        0.3

print(df_mix.dtypes)
# col_int        int64
# col_float    float64
# dtype: object

If you retrieve a row as a Series using loc or iloc, its data type becomes float. Elements in int columns are converted to float.

print(df_mix.loc['B'])
# col_int      1.0
# col_float    0.2
# Name: B, dtype: float64

print(type(df_mix.loc['B']))
# <class 'pandas.core.series.Series'>

If you execute the following code, the element is returned as float.

print(df_mix.loc['B']['col_int'])
# 1.0

print(type(df_mix.loc['B']['col_int']))
# <class 'numpy.float64'>

You can get elements of the original type with at or iat.

print(df_mix.at['B', 'col_int'])
# 1

print(type(df_mix.at['B', 'col_int']))
# <class 'numpy.int64'>

When a row is selected using a list or slice with loc or iloc, a DataFrame is returned instead of a Series.

print(df_mix.loc[['B']])
#    col_int  col_float
# B        1        0.2

print(type(df_mix.loc[['B']]))
# <class 'pandas.core.frame.DataFrame'>

print(df_mix.loc[['B']].dtypes)
# col_int        int64
# col_float    float64
# dtype: object

Related Categories

Related Articles