Calculate Mean, Median, Mode, Variance, Standard Deviation in Python

Posted: 2023-08-03 | Tags: Python, Mathematics

The Python statistics module provides various statistical operations, such as the computation of mean, median, mode, variance, and standard deviation.

statistics — Mathematical statistics functions — Python 3.11.4 documentation

Contents

Mean (arithmetic mean): statistics.mean()
Median: statistics.median(), statistics.median_low(), statistics.median_high()
Mode: statistics.mode(), statistics.multimode()
Variance
- Population variance: statistics.pvariance()
- Sample variance: statistics.variance()
Standard deviation
- Population standard deviation: statistics.pstdev()
- Sample standard deviation: statistics.stdev()

This article does not cover all functions of the module, like the calculation of harmonic and geometric means. Refer to the official documentation linked above for more information.

Although separate installation is required, using NumPy allows for operations on rows and columns of two-dimensional arrays, among other functionalities.

NumPy: Sum, mean, max, min for entire array, column/row-wise

The sample code in this article uses the statistics and math modules. Both are included in the standard library and do not require additional installation.

import statistics
import math

source: statistics_example.py

Mean (arithmetic mean): `statistics.mean()`

statistics.mean() calculates the arithmetic mean, which is the sum of elements divided by their count. It accepts iterable objects, such as lists and tuples, as arguments. The same applies to the functions presented in the following sections.

statistics.mean — Mathematical statistics functions — Python 3.11.4 documentation

l = [1, 3, 8, 15]

print(statistics.mean(l))
# 6.75

source: statistics_example.py

You can calculate the mean using the built-in functions, sum() and len().

print(sum(l) / len(l))
# 6.75

source: statistics_example.py

Median: `statistics.median()`, `statistics.median_low()`, `statistics.median_high()`

statistics.median(), statistics.median_low(), and statistics.median_high() find the median, the middle value when the data is sorted. It's important to note that the data doesn't need to be sorted beforehand.

If the number of data points is odd, all three functions return the middle value directly.

l = [3, 1, 8]

print(statistics.median(l))
# 3

print(statistics.median_low(l))
# 3

print(statistics.median_high(l))
# 3

source: statistics_example.py

If the number of data points is even, statistics.median() returns the arithmetic mean of the two middle values, statistics.median_low() returns the smaller value, and statistics.median_high() returns the larger value.

l = [3, 1, 8, 15]

print(statistics.median(l))
# 5.5

print(statistics.median_low(l))
# 3

print(statistics.median_high(l))
# 8

source: statistics_example.py

You can use the built-in sorted() function and the sort() method of lists for sorting your data.

Sort a list, string, tuple in Python (sort, sorted)

Mode: `statistics.mode()`, `statistics.multimode()`

statistics.mode() and statistics.multimode() allow you to find the mode, which is the most frequently occurring value.

statistics.multimode() always returns the modes as a list, even if there is only one.

l = [3, 2, 3, 2, 1, 2]

print(statistics.mode(l))
# 2

print(statistics.multimode(l))
# [2]

source: statistics_example.py

If multiple modes exist, statistics.mode() returns the first one.

l = [3, 2, 3, 2, 1, 2, 3]

print(statistics.mode(l))
# 3

print(statistics.multimode(l))
# [3, 2]

source: statistics_example.py

You can use the Counter class from the collections module to count the frequency of each element and sort them accordingly.

Count elements in a list with collections.Counter in Python

Variance

Population variance: `statistics.pvariance()`

statistics.pvariance() computes the population variance, which is the appropriate measure when the data represents the entire population.

statistics.pvariance() — Mathematical statistics functions — Python 3.11.4 documentation

l = [10, 1, 3, 7, 1]

print(statistics.pvariance(l))
# 12.64

source: statistics_example.py

The population variance $\sigma^2$ is calculated as follows for a population consisting of $n$ data points with mean $\mu$.

$$ \sigma^2=\frac{1}{n} \sum_{i=1}^{n} (x_i-\mu)^2 $$

By default, the mean is automatically calculated. However, the optional second argument, mu, allows you to specify the mean value directly. For example, if you've already calculated the mean, providing it through mu can help avoid recalculations.

mu = statistics.mean(l)

print(statistics.pvariance(l, mu))
# 12.64

source: statistics_example.py

You can calculate this using the built-in functions, sum() and len().

print(sum((x - sum(l) / len(l)) ** 2 for x in l) / len(l))
# 12.64

source: statistics_example.py

A generator expression is passed to sum().

List comprehensions in Python

Sample variance: `statistics.variance()`

statistics.variance() computes the sample variance, which is the appropriate measure when the data is a sample from a larger population.

statistics.variance() — Mathematical statistics functions — Python 3.11.4 documentation

l = [10, 1, 3, 7, 1]

print(statistics.variance(l))
# 15.8

source: statistics_example.py

This method specifically calculates the unbiased sample variance where the denominator is $n-1$, not $n$. This adjustment to the denominator, known as Bessel's correction, helps to correct the bias in the estimation of the population variance from a sample.

The unbiased sample variance $s^2$ is calculated as follows for a sample of $n$ data points from the population with mean $\overline{x}$.

$$ s^2=\frac{1}{n-1} \sum_{i=1}^{n} (x_i-\overline{x})^2 $$

By default, the mean is automatically calculated. However, the optional second argument, xbar, allows you to specify the mean value directly. For example, if you've already calculated the mean of the sample, providing it through xbar can help avoid recalculations.

xbar = statistics.mean(l)

print(statistics.variance(l, xbar))
# 15.8

source: statistics_example.py

You can calculate this using the built-in functions, sum() and len().

print(sum((x - sum(l) / len(l)) ** 2 for x in l) / (len(l) - 1))
# 15.8

source: statistics_example.py

Standard deviation

Population standard deviation: `statistics.pstdev()`

statistics.pstdev() returns the population standard deviation.

statistics.pstdev() — Mathematical statistics functions — Python 3.11.4 documentation

l = [10, 1, 3, 7, 1]

print(statistics.pstdev(l))
# 3.5552777669262356

source: statistics_example.py

The population standard deviation is the square root of the population variance.

print(math.sqrt(statistics.pvariance(l)))
# 3.5552777669262356

source: statistics_example.py

Sample standard deviation: `statistics.stdev()`

statistics.stdev() returns the sample standard deviation.

statistics.stdev — Mathematical statistics functions — Python 3.11.4 documentation

l = [10, 1, 3, 7, 1]

print(statistics.stdev(l))
# 3.9749213828703582

source: statistics_example.py

The sample standard deviation is the square root of the sample variance.

print(math.sqrt(statistics.variance(l)))
# 3.9749213828703582

source: statistics_example.py

Related Categories

Related Articles