NumPy: Views and copies of arrays
This article explains views and copies of NumPy arrays (ndarray
).
To create a copy of an ndarray
, use the copy()
method. To determine whether an ndarray
is a view, check its base
attribute. To determine whether two arrays share memory, use the np.shares_memory()
or np.may_share_memory()
function.
For views and copies in pandas, see the following article.
The NumPy version used in this article is as follows. Note that functionality may vary between versions.
import numpy as np
print(np.__version__)
# 1.26.1
Views and copies of NumPy arrays
There are two types of ndarray
: views and copies.
When generating one ndarray
from another, an ndarray
that shares memory with the original is called a view, while an ndarray
that allocates new memory, separate from the original, is called a copy.
Example of creating a view
For example, slices create views.
a = np.arange(6).reshape(2, 3)
print(a)
# [[0 1 2]
# [3 4 5]]
a_slice = a[:, :2]
print(a_slice)
# [[0 1]
# [3 4]]
Since the view shares the same memory with the original array, changing the value in one object affects the value in the other.
a_slice[0, 0] = 100
print(a_slice)
# [[100 1]
# [ 3 4]]
print(a)
# [[100 1 2]
# [ 3 4 5]]
a[0, 0] = 0
print(a)
# [[0 1 2]
# [3 4 5]]
print(a_slice)
# [[0 1]
# [3 4]]
In addition to slices, functions and methods such as reshape()
also return views.
Example of creating a copy
Boolean indexing or fancy indexing creates copies.
a = np.arange(6).reshape(2, 3)
print(a)
# [[0 1 2]
# [3 4 5]]
a_boolean_index = a[:, [True, False, True]]
print(a_boolean_index)
# [[0 2]
# [3 5]]
Since they do not share memory, changing the value in one object does not affect the value in the other.
a_boolean_index[0, 0] = 100
print(a_boolean_index)
# [[100 2]
# [ 3 5]]
print(a)
# [[0 1 2]
# [3 4 5]]
Create a copy of an ndarray
: copy()
To create a copy of an ndarray
, use the copy()
method. It is also possible to create a copy from a view.
a = np.arange(6).reshape(2, 3)
print(a)
# [[0 1 2]
# [3 4 5]]
a_slice_copy = a[:, :2].copy()
print(a_slice_copy)
# [[0 1]
# [3 4]]
For example, to process a sub-array selected by a slice separately from the original array, you can use copy()
.
a_slice_copy[0, 0] = 100
print(a_slice_copy)
# [[100 1]
# [ 3 4]]
print(a)
# [[0 1 2]
# [3 4 5]]
Note that there is also the view()
method, but this is only for generating a view of the calling object.
Executing view()
on an object created with boolean indexing or fancy indexing generates a view of that copy, not of the original object.
a_boolean_index_view = a[:, [True, False, True]].view()
print(a_boolean_index_view)
# [[0 2]
# [3 5]]
a_boolean_index_view[0, 0] = 100
print(a_boolean_index_view)
# [[100 2]
# [ 3 5]]
print(a)
# [[0 1 2]
# [3 4 5]]
Check if an ndarray
is a view: base
To determine whether an ndarray
is a view, check its base
attribute.
If the ndarray
is a view, the base
attribute points to the original ndarray
.
Consider slices and reshape()
as examples. reshape()
returns a view whenever possible.
a = np.arange(10)
print(a)
# [0 1 2 3 4 5 6 7 8 9]
a_0 = a[:6]
print(a_0)
# [0 1 2 3 4 5]
print(a_0.base)
# [0 1 2 3 4 5 6 7 8 9]
a_1 = a_0.reshape(2, 3)
print(a_1)
# [[0 1 2]
# [3 4 5]]
print(a_1.base)
# [0 1 2 3 4 5 6 7 8 9]
Newly created arrays or copies have None
as their base
attribute.
a = np.arange(10)
print(a)
# [0 1 2 3 4 5 6 7 8 9]
print(a.base)
# None
a_copy = a.copy()
print(a_copy)
# [0 1 2 3 4 5 6 7 8 9]
print(a_copy.base)
# None
If the base
attribute is not None
, the array can be identified as a view. Use the is
operator to compare it with None
.
print(a_0.base is None)
# False
print(a_copy.base is None)
# True
print(a.base is None)
# True
By comparing the base
attribute with the original ndarray
or the base
of another view, you can also verify that memory is shared.
print(a_0.base is a)
# True
print(a_0.base is a_1.base)
# True
It is more convenient to determine whether memory is shared by using np.shares_memory()
, which is explained next.
Check if memory is shared: np.shares_memory()
The np.shares_memory()
function determines if two arrays share memory.
Basic usage
np.shares_memory()
returns True
if two specified arrays share memory.
a = np.arange(6)
print(a)
# [0 1 2 3 4 5]
a_reshape = a.reshape(2, 3)
print(a_reshape)
# [[0 1 2]
# [3 4 5]]
print(np.shares_memory(a, a_reshape))
# True
It also returns True
for views generated from the same ndarray
.
a_slice = a[2:5]
print(a_slice)
# [2 3 4]
print(np.shares_memory(a_reshape, a_slice))
# True
In the case of copies, False
is returned.
a_reshape_copy = a.reshape(2, 3).copy()
print(a_reshape_copy)
# [[0 1 2]
# [3 4 5]]
print(np.shares_memory(a, a_reshape_copy))
# False
np.may_share_memory()
There is also the np.may_share_memory()
function.
- numpy.may_share_memory — NumPy v1.26 Manual
- python - What is the difference between numpy.shares_memory and numpy.may_share_memory? - Stack Overflow
As "may" in the function name suggests, np.may_share_memory()
is not as strict as np.shares_memory()
.
np.may_share_memory()
determines only if memory addresses overlap, not whether elements actually reference the same memory.
For example, in the following case, two slices are views of the same ndarray
and reference an overlapped range, but each element itself references separate memory.
a = np.arange(10)
print(a)
# [0 1 2 3 4 5 6 7 8 9]
a_0 = a[::2]
print(a_0)
# [0 2 4 6 8]
a_1 = a[1::2]
print(a_1)
# [1 3 5 7 9]
np.shares_memory()
returns False
because it determines more strictly, but np.may_share_memory()
returns True
.
print(np.shares_memory(a_0, a_1))
# False
print(np.may_share_memory(a_0, a_1))
# True
In the following example, since the two slices do not overlap in the range of the original ndarray
, np.may_share_memory()
also returns False
.
a_2 = a[:5]
print(a_2)
# [0 1 2 3 4]
a_3 = a[5:]
print(a_3)
# [5 6 7 8 9]
print(np.shares_memory(a_2, a_3))
# False
print(np.may_share_memory(a_2, a_3))
# False
np.shares_memory()
requires more processing time due to its strict analysis. The following code uses the Jupyter Notebook magic command %%timeit
, and note that it will not be measured if executed as a Python script.
%%timeit
np.shares_memory(a_0, a_1)
# 200 ns ± 1.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
%%timeit
np.may_share_memory(a_0, a_1)
# 123 ns ± 0.284 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
Although the difference is insignificant in the example above, np.shares_memory()
is warned to be exponentially slow for some inputs.
Warning
This function can be exponentially slow for some inputs, unless max_work is set to a finite number or MAY_SHARE_BOUNDS. If in doubt, use numpy.may_share_memory instead. numpy.shares_memory — NumPy v1.26 Manual
np.may_share_memory()
might return True
erroneously when elements do not actually share memory. However, it will never mistakenly return False
when the memory is indeed shared. Thus, if you just need to check if memory could potentially be shared, np.may_share_memory()
is an appropriate choice.