pandas: Get/Set values with loc, iloc, at, iat
You can use loc
, iloc
, at
, and iat
to access data in pandas.DataFrame
and get/set values. Use square brackets []
as in loc[]
, not parentheses ()
as in loc()
.
- pandas.DataFrame.loc — pandas 2.0.3 documentation
- pandas.DataFrame.iloc — pandas 2.0.3 documentation
- pandas.DataFrame.at — pandas 2.0.3 documentation
- pandas.DataFrame.iat — pandas 2.0.3 documentation
The differences are as follows:
- How to specify the position
at
,loc
: Row/Column name (label)iat
,iloc
: Row/Column number
- Data you can get/set
at
,iat
: Single valueloc
,iloc
: Single or multiple values
You can also select rows and columns of pandas.DataFrame
and elements of pandas.Series
by indexing []
.
Note that the previously provided get_value()
and ix[]
have been removed in version 1.0
.
The sample code in this article is based on pandas version 2.0.3
. The following pandas.DataFrame
is used as an example.
import pandas as pd
print(pd.__version__)
# 2.0.3
df = pd.DataFrame({'col_0': ['00', '10', '20', '30', '40'],
'col_1': ['01', '11', '21', '31', '41'],
'col_2': ['02', '12', '22', '32', '42'],
'col_3': ['03', '13', '23', '33', '43']},
index=['row_0', 'row_1', 'row_2', 'row_3', 'row_4'])
print(df)
# col_0 col_1 col_2 col_3
# row_0 00 01 02 03
# row_1 10 11 12 13
# row_2 20 21 22 23
# row_3 30 31 32 33
# row_4 40 41 42 43
at
, iat
: Access and get/set a single value
You can specify the row/column name in at
. In addition to getting data, you can also set (assign) a new value.
print(df.at['row_1', 'col_2'])
# 12
df.at['row_1', 'col_2'] = '0'
print(df.at['row_1', 'col_2'])
# 0
You can specify the row/column number (0-based indexing) in iat
.
print(df.iat[1, 2])
# 0
df.iat[1, 2] = '12'
print(df.iat[1, 2])
# 12
loc
, iloc
: Access and get/set single or multiple values
loc
and iloc
can access both single and multiple values using lists or slices. You can use row/column names for loc
and row/column numbers for iloc
.
Access a single value
You can access a single value with loc
and iloc
as well as with at
and iat
. However, at
and iat
are faster than loc
and iloc
.
print(df.loc['row_1', 'col_2'])
# 12
print(df.iloc[1, 2])
# 12
In addition to retrieving data, you can also set a new value for the element.
df.loc['row_1', 'col_2'] = '0'
print(df.loc['row_1', 'col_2'])
# 0
df.iloc[1, 2] = '12'
print(df.iloc[1, 2])
# 12
Access multiple values using lists and slices
With loc
and iloc
, you can access multiple values by specifying a group of data with a list [a, b, c, ...]
and slice start:stop:step
.
Note that in the slice notation start:stop:step
, the step
is optional and can be omitted. For basic usage of slices, see the following article.
When using the slice notation start:stop:step
with loc
(which uses row/column names), the stop
value is inclusive. However, with iloc
(which uses row/column numbers), the stop
value is exclusive, following the typical behavior of standard Python slices.
When specified by a list, rows and columns follow the order of that list.
print(df.loc['row_1':'row_3', ['col_2', 'col_0']])
# col_2 col_0
# row_1 12 10
# row_2 22 20
# row_3 32 30
print(df.iloc[1:3, [2, 0]])
# col_2 col_0
# row_1 12 10
# row_2 22 20
For example, you can extract odd/even rows by specifying step
.
print(df.iloc[::2, [0, 3]])
# col_0 col_3
# row_0 00 03
# row_2 20 23
# row_4 40 43
print(df.iloc[1::2, [0, 3]])
# col_0 col_3
# row_1 10 13
# row_3 30 33
You can set multiple values simultaneously. If you assign a scalar value, all selected elements will be set to that value. For assigning values to a range, use a two-dimensional list (list of lists) or a two-dimensional NumPy array (ndarray
).
df.iloc[1:3, [2, 0]] = '0'
print(df)
# col_0 col_1 col_2 col_3
# row_0 00 01 02 03
# row_1 0 11 0 13
# row_2 0 21 0 23
# row_3 30 31 32 33
# row_4 40 41 42 43
df.iloc[1:3, [2, 0]] = [['12', '10'], ['22', '20']]
print(df)
# col_0 col_1 col_2 col_3
# row_0 00 01 02 03
# row_1 10 11 12 13
# row_2 20 21 22 23
# row_3 30 31 32 33
# row_4 40 41 42 43
Note that selecting a row or a column by specifying it as a scalar value returns Series
, whereas the same row or column, specified as a slice or a list, returns DataFrame
.
In particular, be aware of potential implicit type conversions when retrieving rows as a Series
. See below for details.
print(df.loc['row_1', ['col_0', 'col_2']])
print(type(df.loc['row_1', ['col_0', 'col_2']]))
# col_0 10
# col_2 12
# Name: row_1, dtype: object
# <class 'pandas.core.series.Series'>
print(df.loc['row_1':'row_1', ['col_0', 'col_2']])
print(type(df.loc['row_1':'row_1', ['col_0', 'col_2']]))
# col_0 col_2
# row_1 10 12
# <class 'pandas.core.frame.DataFrame'>
print(df.loc[['row_1'], ['col_0', 'col_2']])
print(type(df.loc[['row_1'], ['col_0', 'col_2']]))
# col_0 col_2
# row_1 10 12
# <class 'pandas.core.frame.DataFrame'>
Access rows and columns
You can select rows and columns with df[]
. They can be specified as:
- Rows: Slice of row name/number
- Columns: Column name or list of column names
For more information, see the following article.
print(df['row_1':'row_3'])
# col_0 col_1 col_2 col_3
# row_1 10 11 12 13
# row_2 20 21 22 23
# row_3 30 31 32 33
print(df[1:3])
# col_0 col_1 col_2 col_3
# row_1 10 11 12 13
# row_2 20 21 22 23
print(df['col_1'])
# row_0 01
# row_1 11
# row_2 21
# row_3 31
# row_4 41
# Name: col_1, dtype: object
print(df[['col_1', 'col_3']])
# col_1 col_3
# row_0 01 03
# row_1 11 13
# row_2 21 23
# row_3 31 33
# row_4 41 43
You can specify rows and columns in various ways with loc
and iloc
.
If you omit specifying columns with loc
or iloc
, rows are selected. You can specify them by row name/number or list of such names/numbers.
print(df.loc['row_2'])
# col_0 20
# col_1 21
# col_2 22
# col_3 23
# Name: row_2, dtype: object
print(df.iloc[[1, 3]])
# col_0 col_1 col_2 col_3
# row_1 10 11 12 13
# row_3 30 31 32 33
You can select columns with loc
and iloc
by specifying rows as :
. It is possible to specify by slice.
print(df.loc[:, 'col_1':])
# col_1 col_2 col_3
# row_0 01 02 03
# row_1 11 12 13
# row_2 21 22 23
# row_3 31 32 33
# row_4 41 42 43
print(df.iloc[:, 2])
# row_0 02
# row_1 12
# row_2 22
# row_3 32
# row_4 42
# Name: col_2, dtype: object
As mentioned above, specifying a single row or column with a scalar value returns a Series
, while using a slice or list returns a DataFrame
.
Note that selecting a row as pandas.Series
may result in implicit type conversion. See below for details.
print(df.loc['row_2'])
print(type(df.loc['row_2']))
# col_0 20
# col_1 21
# col_2 22
# col_3 23
# Name: row_2, dtype: object
# <class 'pandas.core.series.Series'>
print(df.loc['row_2':'row_2'])
print(type(df.loc['row_2':'row_2']))
# col_0 col_1 col_2 col_3
# row_2 20 21 22 23
# <class 'pandas.core.frame.DataFrame'>
print(df.loc[['row_2']])
print(type(df.loc[['row_2']]))
# col_0 col_1 col_2 col_3
# row_2 20 21 22 23
# <class 'pandas.core.frame.DataFrame'>
Mask by boolean array and pandas.Series
With loc
and iloc
, you can use a boolean array or list to filter data. While the following example demonstrates row filtering, the same approach can be applied to columns.
l_bool = [True, False, False, True, False]
print(df.loc[l_bool, ['col_0', 'col_2']])
# col_0 col_2
# row_0 00 02
# row_3 30 32
print(df.iloc[l_bool, [0, 2]])
# col_0 col_2
# row_0 00 02
# row_3 30 32
If the number of elements does not match, an error is raised.
l_bool_wrong = [True, False, False]
# print(df.loc[l_bool_wrong, ['col_0', 'col_2']])
# IndexError: Boolean index has wrong length: 3 instead of 5
You can also use a boolean Series
with loc
for filtering. Note that the filtering is based on matching labels, not on the order of the data.
s_bool = pd.Series([True, False, False, True, False], index=reversed(df.index))
print(s_bool)
# row_4 True
# row_3 False
# row_2 False
# row_1 True
# row_0 False
# dtype: bool
print(df.loc[s_bool, ['col_0', 'col_2']])
# col_0 col_2
# row_1 10 12
# row_4 40 42
You cannot specify Series
in iloc
.
# print(df.iloc[s_bool, [0, 2]])
# ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types
Even with loc
, an error is raised if the labels do not match.
s_bool_wrong = pd.Series([True, False, False], index=['row_0', 'row_1', 'row_2'])
# print(df.loc[s_bool_wrong, ['col_0', 'col_2']])
# IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
s_bool_wrong = pd.Series([True, False, False, True, False],
index=['row_0', 'row_1', 'row_2', 'row_3', 'XXX'])
# print(df.loc[s_bool_wrong, ['col_0', 'col_2']])
# IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
Duplicated row/column names
Both row names (index
) and column names (columns
) can have duplicates.
Consider the following DataFrame
with duplicate row and column names as an example.
df_duplicated = df.rename(columns={'col_2': 'col_1'}, index={'row_3': 'row_2'})
print(df_duplicated)
# col_0 col_1 col_1 col_3
# row_0 00 01 02 03
# row_1 10 11 12 13
# row_2 20 21 22 23
# row_2 30 31 32 33
# row_4 40 41 42 43
For at
and loc
, specifying duplicate names selects the corresponding multiple elements.
print(df_duplicated.at['row_2', 'col_1'])
print(type(df_duplicated.at['row_2', 'col_1']))
# col_1 col_1
# row_2 21 22
# row_2 31 32
# <class 'pandas.core.frame.DataFrame'>
print(df_duplicated.loc[:'row_2', ['col_1', 'col_3']])
print(type(df_duplicated.loc[:'row_2', ['col_1', 'col_3']]))
# col_1 col_1 col_3
# row_0 01 02 03
# row_1 11 12 13
# row_2 21 22 23
# row_2 31 32 33
# <class 'pandas.core.frame.DataFrame'>
When using iat
and iloc
to specify by row/column number, duplicated names are not an issue because they operate based on position.
print(df_duplicated.iat[2, 1])
# 21
print(df_duplicated.iloc[:2, [1, 3]])
# col_1 col_3
# row_0 01 03
# row_1 11 13
To avoid confusion, it's advisable to use unique values for row and column names unless there's a compelling reason otherwise.
You can check whether row and column names are unique (not duplicated) with index.is_unique
and columns.is_unique
.
print(df_duplicated.index.is_unique)
# False
print(df_duplicated.columns.is_unique)
# False
See the following article on how to rename row and column names.
Specify by number and name
If you want to specify by both number and name, use at
or loc
in combination with the index
or columns
attributes.
You can retrieve row or column names based on their number using the index
and columns
attributes.
print(df.index[2])
# row_2
print(df.columns[2])
# col_2
For index
and columns
, you can use slices and lists to retrieve multiple names.
print(df.index[1:4])
# Index(['row_1', 'row_2', 'row_3'], dtype='object')
print(df.columns[[1, 3]])
# Index(['col_1', 'col_3'], dtype='object')
Using this and at
or loc
, you can specify by number and name.
print(df.at[df.index[2], 'col_2'])
# 22
print(df.loc[['row_0', 'row_3'], df.columns[[1, 3]]])
# col_1 col_3
# row_0 01 03
# row_3 31 33
Using indexing operations in succession, such as df[...][...]
, df.loc[...].iloc[...]
, and other similar patterns, is known as "chained indexing". This approach can trigger a SettingWithCopyWarning
.
While this approach causes no issues during simple data retrieval and checking, be cautious as assigning new values might yield unexpected results.
print(df['col_2'][2])
# 22
print(df.loc[['row_0', 'row_3']].iloc[:, [1, 3]])
# col_1 col_3
# row_0 01 03
# row_3 31 33
Implicit type conversion when selecting a row as pandas.Series
If the columns of the original DataFrame
have different data types, then when selecting a row as a Series
with loc
or iloc
, the data type of the elements in the selected Series
might differ from the data types in the original DataFrame
.
Consider a DataFrame
with columns of integers (int
) and floating point numbers (float
).
df_mix = pd.DataFrame({'col_int': [0, 1, 2], 'col_float': [0.1, 0.2, 0.3]}, index=['A', 'B', 'C'])
print(df_mix)
# col_int col_float
# A 0 0.1
# B 1 0.2
# C 2 0.3
print(df_mix.dtypes)
# col_int int64
# col_float float64
# dtype: object
If you retrieve a row as a Series
using loc
or iloc
, its data type becomes float
. Elements in int
columns are converted to float
.
print(df_mix.loc['B'])
# col_int 1.0
# col_float 0.2
# Name: B, dtype: float64
print(type(df_mix.loc['B']))
# <class 'pandas.core.series.Series'>
If you execute the following code, the element is returned as float
.
print(df_mix.loc['B']['col_int'])
# 1.0
print(type(df_mix.loc['B']['col_int']))
# <class 'numpy.float64'>
You can get elements of the original type with at
or iat
.
print(df_mix.at['B', 'col_int'])
# 1
print(type(df_mix.at['B', 'col_int']))
# <class 'numpy.int64'>
When a row is selected using a list or slice with loc
or iloc
, a DataFrame
is returned instead of a Series
.
print(df_mix.loc[['B']])
# col_int col_float
# B 1 0.2
print(type(df_mix.loc[['B']]))
# <class 'pandas.core.frame.DataFrame'>
print(df_mix.loc[['B']].dtypes)
# col_int int64
# col_float float64
# dtype: object