note.nkmk.me

pandas: Get first / last n rows of DataFrame with head(), tail(), slice

Posted: 2019-07-12 / Tags: Python, pandas

For checking the data of pandas.DataFrame and pandas.Series with many rows, head() and tail() methods that return the first and last n rows are useful.

Here, the following contents will be described.

  • Get first n rows of DataFrame: head()
  • Get last n rows of DataFrame: tail()
  • Get rows by specifying row numbers: slice
  • Get values of first / last row

Note that another method you can use to check large sized pandas.DataFrame and pandas.Series is sample() for random sampling.

As an example, use the iris data set included as a sample in seaborn.

import pandas as pd
import seaborn as sns

df = sns.load_dataset("iris")
print(df.shape)
# (150, 5)

The following example is for pandas.DataFrame, but pandas.Series also has head() and tail(). The usage is the same for both.

Sponsored Link

Get first n rows of DataFrame: head()

The head() method returns the first n rows.

By default, the first 5 rows are returned.

print(df.head())
#    sepal_length  sepal_width  petal_length  petal_width species
# 0           5.1          3.5           1.4          0.2  setosa
# 1           4.9          3.0           1.4          0.2  setosa
# 2           4.7          3.2           1.3          0.2  setosa
# 3           4.6          3.1           1.5          0.2  setosa
# 4           5.0          3.6           1.4          0.2  setosa

You can specify the number of rows.

print(df.head(3))
#    sepal_length  sepal_width  petal_length  petal_width species
# 0           5.1          3.5           1.4          0.2  setosa
# 1           4.9          3.0           1.4          0.2  setosa
# 2           4.7          3.2           1.3          0.2  setosa

Get last n rows of DataFrame: tail()

The tail() method returns the last n rows.

By default, the last 5 rows are returned.

print(df.tail())
#      sepal_length  sepal_width  petal_length  petal_width    species
# 145           6.7          3.0           5.2          2.3  virginica
# 146           6.3          2.5           5.0          1.9  virginica
# 147           6.5          3.0           5.2          2.0  virginica
# 148           6.2          3.4           5.4          2.3  virginica
# 149           5.9          3.0           5.1          1.8  virginica

You can specify the number of rows.

print(df.tail(3))
#      sepal_length  sepal_width  petal_length  petal_width    species
# 147           6.5          3.0           5.2          2.0  virginica
# 148           6.2          3.4           5.4          2.3  virginica
# 149           5.9          3.0           5.1          1.8  virginica
Sponsored Link

Get rows by specifying row numbers: slice

It is also possible to get rows with using slice.

print(df[50:55])
#     sepal_length  sepal_width  petal_length  petal_width     species
# 50           7.0          3.2           4.7          1.4  versicolor
# 51           6.4          3.2           4.5          1.5  versicolor
# 52           6.9          3.1           4.9          1.5  versicolor
# 53           5.5          2.3           4.0          1.3  versicolor
# 54           6.5          2.8           4.6          1.5  versicolor

You can use slices to do the same thing as head() and tail().

print(df[:5])
#    sepal_length  sepal_width  petal_length  petal_width species
# 0           5.1          3.5           1.4          0.2  setosa
# 1           4.9          3.0           1.4          0.2  setosa
# 2           4.7          3.2           1.3          0.2  setosa
# 3           4.6          3.1           1.5          0.2  setosa
# 4           5.0          3.6           1.4          0.2  setosa

print(df[-5:])
#      sepal_length  sepal_width  petal_length  petal_width    species
# 145           6.7          3.0           5.2          2.3  virginica
# 146           6.3          2.5           5.0          1.9  virginica
# 147           6.5          3.0           5.2          2.0  virginica
# 148           6.2          3.4           5.4          2.3  virginica
# 149           5.9          3.0           5.1          1.8  virginica

Get values of first / last row

If you specify n=1 in head() or tail(), you can get the first or last row, but even if only one row, the type is pandas.DataFrame.

print(df.head(1))
#    sepal_length  sepal_width  petal_length  petal_width species
# 0           5.1          3.5           1.4          0.2  setosa

print(type(df.head(1)))
# <class 'pandas.core.frame.DataFrame'>

If you specify only one line using iloc, you can get the line as pandas.Series. pandas.Series is easier to get the value. You can get the first row with iloc[0] and the last row with iloc[-1].

If you want to get the value of the element, you can do with iloc[0]['column_name'], iloc[-1]['column_name'].

print(df.iloc[0])
# sepal_length       5.1
# sepal_width        3.5
# petal_length       1.4
# petal_width        0.2
# species         setosa
# Name: 0, dtype: object

print(type(df.iloc[0]))
# <class 'pandas.core.series.Series'>

print(df.iloc[0]['sepal_length'])
# 5.1
print(df.iloc[-1])
# sepal_length          5.9
# sepal_width             3
# petal_length          5.1
# petal_width           1.8
# species         virginica
# Name: 149, dtype: object

print(type(df.iloc[-1]))
# <class 'pandas.core.series.Series'>

print(df.iloc[-1]['sepal_length'])
# 5.9

Note that an error will occur without .iloc.

# print(df[0])
# KeyError: 0

# print(df[-1])
# KeyError: -1
Sponsored Link
Share

Related Categories

Related Posts