note.nkmk.me

pandas: Get the number of rows, columns, all elements (size) of DataFrame

Posted: 2019-07-12 / Tags: Python, pandas

Introduce how to get the number of rows, columns and total number of elements (size) of pandas.DataFrame and pandas.Series.

  • pandas.DataFrame
    • Display number of rows, columns, etc.: df.info()
    • Get the number of rows: len(df)
    • Get the number of columns: len(df.columns)
    • Get the number of rows and columns: df.shape
    • Get the number of elements: df.size
    • Notes when specifying index
  • pandas.Series
    • Get the number of elements: len(s), s.size

As an example, use Titanic survivor data. It can be downloaded from Kaggle.

import pandas as pd

df = pd.read_csv('data/src/titanic_train.csv')

print(df.head())
#    PassengerId  Survived  Pclass  \
# 0            1         0       3   
# 1            2         1       1   
# 2            3         1       3   
# 3            4         1       1   
# 4            5         0       3   
# 
#                                                 Name     Sex   Age  SibSp  \
# 0                            Braund, Mr. Owen Harris    male  22.0      1   
# 1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
# 2                             Heikkinen, Miss. Laina  female  26.0      0   
# 3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
# 4                           Allen, Mr. William Henry    male  35.0      0   
# 
#    Parch            Ticket     Fare Cabin Embarked  
# 0      0         A/5 21171   7.2500   NaN        S  
# 1      0          PC 17599  71.2833   C85        C  
# 2      0  STON/O2. 3101282   7.9250   NaN        S  
# 3      0            113803  53.1000  C123        S  
# 4      0            373450   8.0500   NaN        S  
Sponsored Link

Get the number of rows, columns, elements of pandas.DataFrame

Display number of rows, columns, etc.: df.info()

The info() method of pandas.DataFrame can display information such as the number of rows and columns, the total memory usage, the data type of each column, and the number of non-NaN elements.

df.info()
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 891 entries, 0 to 890
# Data columns (total 12 columns):
# PassengerId    891 non-null int64
# Survived       891 non-null int64
# Pclass         891 non-null int64
# Name           891 non-null object
# Sex            891 non-null object
# Age            714 non-null float64
# SibSp          891 non-null int64
# Parch          891 non-null int64
# Ticket         891 non-null object
# Fare           891 non-null float64
# Cabin          204 non-null object
# Embarked       889 non-null object
# dtypes: float64(2), int64(5), object(5)
# memory usage: 83.6+ KB

The result is standard output and can not be obtained as a value.

Get the number of rows: len(df)

The number of rows of pandas.DataFrame can be obtained with the Python built-in function len().

In the example, it is displayed using print(), but len() returns an integer value, so it can be assigned to another variable or used for calculation.

print(len(df))
# 891

Get the number of columns: len(df.columns)

The number of columns of pandas.DataFrame can be obtained by applying len() to the columns attribute.

print(len(df.columns))
# 12

Get the number of rows and columns: df.shape

The shape attribute of pandas.DataFrame stores the number of rows and columns as a tuple (number of rows, number of columns).

print(df.shape)
# (891, 12)

print(df.shape[0])
# 891

print(df.shape[1])
# 12

It is also possible to unpack and store them in separate variables.

row, col = df.shape
print(row)
# 891

print(col)
# 12

Get the number of elements: df.size

The total number of elements of pandas.DataFrame is stored in the size attribute. This is equal to the row_count * column_count.

print(df.size)
# 10692

print(df.shape[0] * df.shape[1])
# 10692

Notes when specifying index

When a column of data is specified as an index by the set_index() method, these columns are removed from the data body (values attribute), so it is not counted as the number of columns.

df_multiindex = df.set_index(['Sex', 'Pclass', 'Embarked', 'PassengerId'])

print(len(df_multiindex))
# 891

print(len(df_multiindex.columns))
# 8

print(df_multiindex.shape)
# (891, 8)

print(df_multiindex.size)
# 7128

See the following post for set_index().

Get the number of elements of pandas.Series

As an example of pandas.Series, select one row from pandas.DataFrame.

s = df['PassengerId']
print(s.head())
# 0    1
# 1    2
# 2    3
# 3    4
# 4    5
# Name: PassengerId, dtype: int64

Get the number of elements : len(s), s.size

Since pandas.Series is one-dimensional, you can get the total number of elements (size) with either len() or size attribute.

Note that the shape attribute is a tuple with one element.

print(len(s))
# 891

print(s.size)
# 891

print(s.shape)
# (891,)

There is no info() method in pandas.Series.

Sponsored Link
Share

Related Categories

Related Posts