NumPy: Views and copies of arrays

Modified: | Tags: Python, NumPy

This article explains views and copies of NumPy arrays (ndarray).

To create a copy of an ndarray, use the copy() method. To determine whether an ndarray is a view, check its base attribute. To determine whether two arrays share memory, use the np.shares_memory() or np.may_share_memory() function.

For views and copies in pandas, see the following article.

The NumPy version used in this article is as follows. Note that functionality may vary between versions.

import numpy as np

print(np.__version__)
# 1.26.1

Views and copies of NumPy arrays

There are two types of ndarray: views and copies.

When generating one ndarray from another, an ndarray that shares memory with the original is called a view, while an ndarray that allocates new memory, separate from the original, is called a copy.

Example of creating a view

For example, slices create views.

a = np.arange(6).reshape(2, 3)
print(a)
# [[0 1 2]
#  [3 4 5]]

a_slice = a[:, :2]
print(a_slice)
# [[0 1]
#  [3 4]]

Since the view shares the same memory with the original array, changing the value in one object affects the value in the other.

a_slice[0, 0] = 100
print(a_slice)
# [[100   1]
#  [  3   4]]

print(a)
# [[100   1   2]
#  [  3   4   5]]

a[0, 0] = 0
print(a)
# [[0 1 2]
#  [3 4 5]]

print(a_slice)
# [[0 1]
#  [3 4]]

In addition to slices, functions and methods such as reshape() also return views.

Example of creating a copy

Boolean indexing or fancy indexing creates copies.

a = np.arange(6).reshape(2, 3)
print(a)
# [[0 1 2]
#  [3 4 5]]

a_boolean_index = a[:, [True, False, True]]
print(a_boolean_index)
# [[0 2]
#  [3 5]]

Since they do not share memory, changing the value in one object does not affect the value in the other.

a_boolean_index[0, 0] = 100
print(a_boolean_index)
# [[100   2]
#  [  3   5]]

print(a)
# [[0 1 2]
#  [3 4 5]]

Create a copy of an ndarray: copy()

To create a copy of an ndarray, use the copy() method. It is also possible to create a copy from a view.

a = np.arange(6).reshape(2, 3)
print(a)
# [[0 1 2]
#  [3 4 5]]

a_slice_copy = a[:, :2].copy()
print(a_slice_copy)
# [[0 1]
#  [3 4]]

For example, to process a sub-array selected by a slice separately from the original array, you can use copy().

a_slice_copy[0, 0] = 100
print(a_slice_copy)
# [[100   1]
#  [  3   4]]

print(a)
# [[0 1 2]
#  [3 4 5]]

Note that there is also the view() method, but this is only for generating a view of the calling object.

Executing view() on an object created with boolean indexing or fancy indexing generates a view of that copy, not of the original object.

a_boolean_index_view = a[:, [True, False, True]].view()
print(a_boolean_index_view)
# [[0 2]
#  [3 5]]

a_boolean_index_view[0, 0] = 100
print(a_boolean_index_view)
# [[100   2]
#  [  3   5]]

print(a)
# [[0 1 2]
#  [3 4 5]]

Check if an ndarray is a view: base

To determine whether an ndarray is a view, check its base attribute.

If the ndarray is a view, the base attribute points to the original ndarray.

Consider slices and reshape() as examples. reshape() returns a view whenever possible.

a = np.arange(10)
print(a)
# [0 1 2 3 4 5 6 7 8 9]

a_0 = a[:6]
print(a_0)
# [0 1 2 3 4 5]

print(a_0.base)
# [0 1 2 3 4 5 6 7 8 9]

a_1 = a_0.reshape(2, 3)
print(a_1)
# [[0 1 2]
#  [3 4 5]]

print(a_1.base)
# [0 1 2 3 4 5 6 7 8 9]

Newly created arrays or copies have None as their base attribute.

a = np.arange(10)
print(a)
# [0 1 2 3 4 5 6 7 8 9]

print(a.base)
# None

a_copy = a.copy()
print(a_copy)
# [0 1 2 3 4 5 6 7 8 9]

print(a_copy.base)
# None

If the base attribute is not None, the array can be identified as a view. Use the is operator to compare it with None.

print(a_0.base is None)
# False

print(a_copy.base is None)
# True

print(a.base is None)
# True

By comparing the base attribute with the original ndarray or the base of another view, you can also verify that memory is shared.

print(a_0.base is a)
# True

print(a_0.base is a_1.base)
# True

It is more convenient to determine whether memory is shared by using np.shares_memory(), which is explained next.

Check if memory is shared: np.shares_memory()

The np.shares_memory() function determines if two arrays share memory.

Basic usage

np.shares_memory() returns True if two specified arrays share memory.

a = np.arange(6)
print(a)
# [0 1 2 3 4 5]

a_reshape = a.reshape(2, 3)
print(a_reshape)
# [[0 1 2]
#  [3 4 5]]

print(np.shares_memory(a, a_reshape))
# True

It also returns True for views generated from the same ndarray.

a_slice = a[2:5]
print(a_slice)
# [2 3 4]

print(np.shares_memory(a_reshape, a_slice))
# True

In the case of copies, False is returned.

a_reshape_copy = a.reshape(2, 3).copy()
print(a_reshape_copy)
# [[0 1 2]
#  [3 4 5]]

print(np.shares_memory(a, a_reshape_copy))
# False

np.may_share_memory()

There is also the np.may_share_memory() function.

As "may" in the function name suggests, np.may_share_memory() is not as strict as np.shares_memory().

np.may_share_memory() determines only if memory addresses overlap, not whether elements actually reference the same memory.

For example, in the following case, two slices are views of the same ndarray and reference an overlapped range, but each element itself references separate memory.

a = np.arange(10)
print(a)
# [0 1 2 3 4 5 6 7 8 9]

a_0 = a[::2]
print(a_0)
# [0 2 4 6 8]

a_1 = a[1::2]
print(a_1)
# [1 3 5 7 9]

np.shares_memory() returns False because it determines more strictly, but np.may_share_memory() returns True.

print(np.shares_memory(a_0, a_1))
# False

print(np.may_share_memory(a_0, a_1))
# True

In the following example, since the two slices do not overlap in the range of the original ndarray, np.may_share_memory() also returns False.

a_2 = a[:5]
print(a_2)
# [0 1 2 3 4]

a_3 = a[5:]
print(a_3)
# [5 6 7 8 9]

print(np.shares_memory(a_2, a_3))
# False

print(np.may_share_memory(a_2, a_3))
# False

np.shares_memory() requires more processing time due to its strict analysis. The following code uses the Jupyter Notebook magic command %%timeit, and note that it will not be measured if executed as a Python script.

%%timeit
np.shares_memory(a_0, a_1)
# 200 ns ± 1.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

%%timeit
np.may_share_memory(a_0, a_1)
# 123 ns ± 0.284 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

Although the difference is insignificant in the example above, np.shares_memory() is warned to be exponentially slow for some inputs.

Warning
This function can be exponentially slow for some inputs, unless max_work is set to a finite number or MAY_SHARE_BOUNDS. If in doubt, use numpy.may_share_memory instead. numpy.shares_memory — NumPy v1.26 Manual

np.may_share_memory() might return True erroneously when elements do not actually share memory. However, it will never mistakenly return False when the memory is indeed shared. Thus, if you just need to check if memory could potentially be shared, np.may_share_memory() is an appropriate choice.

Related Categories

Related Articles