pandas: Get first/last n rows of DataFrame with head() and tail()
In pandas, the head()
and tail()
methods are used to get the first and last n
rows of a DataFrame
, as well as the first and last n
elements of a Series
.
Another method useful for examining data in large DataFrame
or Series
is sample()
, which randomly samples rows or columns.
The pandas version used in this article is as follows. Note that functionality may vary between versions. The following DataFrame
with 10 rows is used as an example.
import pandas as pd
print(pd.__version__)
# 2.1.4
df = pd.DataFrame({'col_0': list('ABCDEFGHIJ'), 'col_1': range(9, -1, -1)},
index=[f'row_{i}' for i in range(10)])
print(df)
# col_0 col_1
# row_0 A 9
# row_1 B 8
# row_2 C 7
# row_3 D 6
# row_4 E 5
# row_5 F 4
# row_6 G 3
# row_7 H 2
# row_8 I 1
# row_9 J 0
The following examples use DataFrame
, but Series
also supports the head()
and tail()
methods in the same manner.
Get the first n
rows: head()
The head()
method returns the first n
rows.
By default, the first 5 rows are returned.
print(df.head())
# col_0 col_1
# row_0 A 9
# row_1 B 8
# row_2 C 7
# row_3 D 6
# row_4 E 5
You can specify the number of rows as the first argument, n
.
print(df.head(3))
# col_0 col_1
# row_0 A 9
# row_1 B 8
# row_2 C 7
Get the last n
rows: tail()
The tail()
method returns the last n
rows.
By default, the last 5 rows are returned.
print(df.tail())
# col_0 col_1
# row_5 F 4
# row_6 G 3
# row_7 H 2
# row_8 I 1
# row_9 J 0
You can specify the number of rows as the first argument, n
.
print(df.tail(3))
# col_0 col_1
# row_7 H 2
# row_8 I 1
# row_9 J 0
Get rows by specifying row numbers: slice
You can get rows at any position by specifying row numbers with slices.
print(df[3:6])
# col_0 col_1
# row_3 D 6
# row_4 E 5
# row_5 F 4
It is also possible to perform similar operations to head()
and tail()
using slices.
print(df[:5])
# col_0 col_1
# row_0 A 9
# row_1 B 8
# row_2 C 7
# row_3 D 6
# row_4 E 5
print(df[-5:])
# col_0 col_1
# row_5 F 4
# row_6 G 3
# row_7 H 2
# row_8 I 1
# row_9 J 0
Get the first/last row and its values
Passing 1
to head()
or tail()
returns the first or last row, respectively. However, it is important to note that even a single row is returned as a DataFrame
.
print(df.head(1))
# col_0 col_1
# row_0 A 9
print(type(df.head(1)))
# <class 'pandas.core.frame.DataFrame'>
Use iloc
to get a single row as a Series
: iloc[0]
for the first row and iloc[-1]
for the last row. To retrieve a specific value, use iloc[0]['column_name']
or iloc[-1]['column_name']
.
print(df.iloc[0])
# col_0 A
# col_1 9
# Name: row_0, dtype: object
print(type(df.iloc[0]))
# <class 'pandas.core.series.Series'>
print(df.iloc[0]['col_0'])
# A
print(df.iloc[-1])
# col_0 J
# col_1 0
# Name: row_9, dtype: object
print(type(df.iloc[-1]))
# <class 'pandas.core.series.Series'>
print(df.iloc[-1]['col_0'])
# J
Note that when assigning values using the above approach, a SettingWithCopyWarning
may occur.
df.iloc[0]['col_0'] = 'AAA'
# /var/folders/rf/b7l8_vgj5mdgvghn_326rn_c0000gn/T/ipykernel_48384/183824280.py:1: SettingWithCopyWarning:
# A value is trying to be set on a copy of a slice from a DataFrame
To avoid the SettingWithCopyWarning
, get the first/last row name from the index
attribute and specify it in at
. loc
can also be used, but at
is faster for retrieving and assigning a single value.
df.at[df.index[0], 'col_0'] = 'AAA'
df.at[df.index[-1], 'col_0'] = 'JJJ'
print(df)
# col_0 col_1
# row_0 AAA 9
# row_1 B 8
# row_2 C 7
# row_3 D 6
# row_4 E 5
# row_5 F 4
# row_6 G 3
# row_7 H 2
# row_8 I 1
# row_9 JJJ 0
For more details on at
, iat
, loc
, and iloc
, refer to the following article.