pandas: Get the number of rows, columns, elements (size) in DataFrame
This article explains how to get the number of rows, columns, and total elements (i.e., size) in a pandas.DataFrame and pandas.Series.
As an example, we will use the Titanic survivor dataset, which can be downloaded from Kaggle.
import pandas as pd
print(pd.__version__)
# 2.0.0
df = pd.read_csv('data/src/titanic_train.csv')
print(df.head())
# PassengerId Survived Pclass
# 0 1 0 3 \
# 1 2 1 1
# 2 3 1 3
# 3 4 1 1
# 4 5 0 3
#
# Name Sex Age SibSp
# 0 Braund, Mr. Owen Harris male 22.0 1 \
# 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1
# 2 Heikkinen, Miss. Laina female 26.0 0
# 3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1
# 4 Allen, Mr. William Henry male 35.0 0
#
# Parch Ticket Fare Cabin Embarked
# 0 0 A/5 21171 7.2500 NaN S
# 1 0 PC 17599 71.2833 C85 C
# 2 0 STON/O2. 3101282 7.9250 NaN S
# 3 0 113803 53.1000 C123 S
# 4 0 373450 8.0500 NaN S
Get the number of rows, columns, and elements in a pandas.DataFrame
Display the number of rows and columns: df.info()
The info() method of a DataFrame displays a summary that includes the number of rows and columns, memory usage, data types of each column, and the number of non-null values.
df.info()
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 891 entries, 0 to 890
# Data columns (total 12 columns):
# # Column Non-Null Count Dtype
# --- ------ -------------- -----
# 0 PassengerId 891 non-null int64
# 1 Survived 891 non-null int64
# 2 Pclass 891 non-null int64
# 3 Name 891 non-null object
# 4 Sex 891 non-null object
# 5 Age 714 non-null float64
# 6 SibSp 891 non-null int64
# 7 Parch 891 non-null int64
# 8 Ticket 891 non-null object
# 9 Fare 891 non-null float64
# 10 Cabin 204 non-null object
# 11 Embarked 889 non-null object
# dtypes: float64(2), int64(5), object(5)
# memory usage: 83.7+ KB
The result is printed to the standard output and cannot be assigned to a variable or used in calculations.
Get the number of rows and columns: df.shape
The shape attribute of a DataFrame returns a tuple in the form (number of rows, number of columns).
print(df.shape)
# (891, 12)
print(df.shape[0])
# 891
print(df.shape[1])
# 12
You can unpack this tuple to assign the row and column counts to individual variables:
row, col = df.shape
print(row)
# 891
print(col)
# 12
Get the number of rows: len(df)
You can get the number of rows in a DataFrame using the built-in len() function:
print(len(df))
# 891
Get the number of columns: len(df.columns)
To get the number of columns, apply len() to the columns attribute:
print(len(df.columns))
# 12
Get the total number of elements: df.size
The total number of elements in a DataFrame is available via the size attribute, which equals row_count * column_count.
print(df.size)
# 10692
print(df.shape[0] * df.shape[1])
# 10692
Notes when setting an index
When using the set_index() method to set one or more columns as the index, those columns are removed from the main data (i.e., they are no longer part of the values). Consequently, they are excluded from the total column count.
df_multiindex = df.set_index(['Sex', 'Pclass', 'Embarked', 'PassengerId'])
print(df_multiindex.shape)
# (891, 8)
print(len(df_multiindex))
# 891
print(len(df_multiindex.columns))
# 8
print(df_multiindex.size)
# 7128
For details on set_index(), refer to the following article:
Get the number of elements in a pandas.Series
To demonstrate with a Series, we extract a single column from a DataFrame:
s = df['PassengerId']
print(s.head())
# 0 1
# 1 2
# 2 3
# 3 4
# 4 5
# Name: PassengerId, dtype: int64
Get the number of elements: len(s) and s.size
Since a Series is one-dimensional, you can obtain its total number of elements using len(), the size attribute, or the shape attribute. Note that shape returns a one-element tuple.
print(len(s))
# 891
print(s.size)
# 891
print(s.shape)
# (891,)
print(type(s.shape))
# <class 'tuple'>
The info() method was introduced for Series in pandas 1.4. It provides similar metadata as DataFrame.info(), including the number of non-null values and memory usage.
s.info()
# <class 'pandas.core.series.Series'>
# RangeIndex: 891 entries, 0 to 890
# Series name: PassengerId
# Non-Null Count Dtype
# -------------- -----
# 891 non-null int64
# dtypes: int64(1)
# memory usage: 7.1 KB