pandas: Get the number of rows, columns, elements (size) of DataFrame

Modified: | Tags: Python, pandas

This article explains how to get the number of rows, columns, and total elements (size) in pandas.DataFrame and pandas.Series.

Consider the Titanic survivor data as an example. It can be downloaded from Kaggle.

import pandas as pd

print(pd.__version__)
# 2.0.0

df = pd.read_csv('data/src/titanic_train.csv')
print(df.head())
#    PassengerId  Survived  Pclass   
# 0            1         0       3  \
# 1            2         1       1   
# 2            3         1       3   
# 3            4         1       1   
# 4            5         0       3   
# 
#                                                 Name     Sex   Age  SibSp   
# 0                            Braund, Mr. Owen Harris    male  22.0      1  \
# 1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
# 2                             Heikkinen, Miss. Laina  female  26.0      0   
# 3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
# 4                           Allen, Mr. William Henry    male  35.0      0   
# 
#    Parch            Ticket     Fare Cabin Embarked  
# 0      0         A/5 21171   7.2500   NaN        S  
# 1      0          PC 17599  71.2833   C85        C  
# 2      0  STON/O2. 3101282   7.9250   NaN        S  
# 3      0            113803  53.1000  C123        S  
# 4      0            373450   8.0500   NaN        S  

Get the number of rows, columns, and elements in pandas.DataFrame

Display the number of rows, columns, etc.: df.info()

The info() method of DataFrame displays information such as the number of rows and columns, total memory usage, the data type of each column, and the count of non-NaN elements.

df.info()
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 891 entries, 0 to 890
# Data columns (total 12 columns):
#  #   Column       Non-Null Count  Dtype  
# ---  ------       --------------  -----  
#  0   PassengerId  891 non-null    int64  
#  1   Survived     891 non-null    int64  
#  2   Pclass       891 non-null    int64  
#  3   Name         891 non-null    object 
#  4   Sex          891 non-null    object 
#  5   Age          714 non-null    float64
#  6   SibSp        891 non-null    int64  
#  7   Parch        891 non-null    int64  
#  8   Ticket       891 non-null    object 
#  9   Fare         891 non-null    float64
#  10  Cabin        204 non-null    object 
#  11  Embarked     889 non-null    object 
# dtypes: float64(2), int64(5), object(5)
# memory usage: 83.7+ KB

The result is displayed as standard output; it cannot be directly assigned to a variable or used in calculations.

Get the number of rows and columns: df.shape

The shape attribute of DataFrame stores the number of rows and columns as a tuple (number of rows, number of columns).

print(df.shape)
# (891, 12)

print(df.shape[0])
# 891

print(df.shape[1])
# 12

You can unpack the tuple and assign the values to separate variables.

row, col = df.shape
print(row)
# 891

print(col)
# 12

Get the number of rows: len(df)

The number of rows in DataFrame can be obtained with the Python built-in function len().

print(len(df))
# 891

Get the number of columns: len(df.columns)

The number of columns in DataFrame can be obtained by applying len() to the columns attribute.

print(len(df.columns))
# 12

Get the number of elements: df.size

The total number of elements in DataFrame is stored in the size attribute. This is equal to row_count * column_count.

print(df.size)
# 10692

print(df.shape[0] * df.shape[1])
# 10692

Notes when setting an index

When using the set_index() method to set columns of data as an index, these columns are removed from the main data body (the values attribute). As a result, they are no longer included in the total column count.

df_multiindex = df.set_index(['Sex', 'Pclass', 'Embarked', 'PassengerId'])

print(df_multiindex.shape)
# (891, 8)

print(len(df_multiindex))
# 891

print(len(df_multiindex.columns))
# 8

print(df_multiindex.size)
# 7128

See the following article for set_index().

Get the number of elements in pandas.Series

For a Series example, select one column from a DataFrame.

s = df['PassengerId']
print(s.head())
# 0    1
# 1    2
# 2    3
# 3    4
# 4    5
# Name: PassengerId, dtype: int64

Get the number of elements: len(s), s.size

Since Series is one-dimensional, you can get the total number of elements (size) using either len() or the size and shape attributes. Note that the shape attribute is a tuple with one element.

print(len(s))
# 891

print(s.size)
# 891

print(s.shape)
# (891,)

print(type(s.shape))
# <class 'tuple'>

The info() method was also added to Series in pandas 1.4.

s.info()
# <class 'pandas.core.series.Series'>
# RangeIndex: 891 entries, 0 to 890
# Series name: PassengerId
# Non-Null Count  Dtype
# --------------  -----
# 891 non-null    int64
# dtypes: int64(1)
# memory usage: 7.1 KB

Related Categories

Related Articles