note.nkmk.me

NumPy: Determine if ndarray is view or copy, and if it shares memory

Posted: 2020-09-24 / Tags: Python, NumPy

You can use the base attribute to determine if the NumPy array numpy.ndarray is a view or a copy. In addition, np.shares_memory() can be used to determine if two arrays share memory.

This article describes the following:

  • View and copy of numpy.ndarray
    • Example of creating a view
    • Example of creating a copy
    • copy() and view()
  • Determine if view or copy: base attribute
  • Determine if memory is shared: np.shares_memory()
    • Basic usage
    • np.may_share_memory()

The version of NumPy in the sample code below is 1.16.4. Note that different versions may behave differently.

Sponsored Link

View and copy of numpy.ndarray

There are two types of numpy.ndarray: views and copies.

When you create another array object from one array object, the object that shares memory with the original object (refers to part or all of the memory of the original object) is called a view.

On the other hand, an object that newly allocates memory separately from the original object is called a copy.

Example of creating a view

For example, slices create views.

import numpy as np

a_2d = np.arange(12).reshape(3, 4)
print(a_2d)
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

a_slice = a_2d[:2, :2]
print(a_slice)
# [[0 1]
#  [4 5]]

The original object and the view refer to the same memory, so changing the value of an element in one object changes the value in the other.

a_slice[0, 0] = 100
print(a_slice)
# [[100   1]
#  [  4   5]]

print(a_2d)
# [[100   1   2   3]
#  [  4   5   6   7]
#  [  8   9  10  11]]

a_2d[0, 0] = 0
print(a_2d)
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

print(a_slice)
# [[0 1]
#  [4 5]]

In addition to slices, some functions and methods, such as reshape(), which will be used as an example in the next section, return a view.

Example of creating a copy

The fancy indexing creates copies.

a_fancy_index = a_2d[[0, 1]]
print(a_fancy_index)
# [[0 1 2 3]
#  [4 5 6 7]]

Since they do not share memory, changing the value of one object does not change the value of the other.

a_fancy_index[0, 0] = 100
print(a_fancy_index)
# [[100   1   2   3]
#  [  4   5   6   7]]

print(a_2d)
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

copy() and view()

You can use copy() to create a copy of an array object.

It is also possible to create a copy of the view.

a_slice_copy = a_2d[:2, :2].copy()
print(a_slice_copy)
# [[0 1]
#  [4 5]]

Changing the value of an element of one object does not change the value of the other.

a_slice_copy[0, 0] = 100
print(a_slice_copy)
# [[100   1]
#  [  4   5]]

print(a_2d)
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

There is also a method called view().

However, for example, calling view() from an object created by fancy indexing will only return a copy's view, not the original object's view.

Determine if view or copy: base attribute

Use the base attribute to determine if numpy.ndarray is a view or a copy (strictly a view or not).

If numpy.ndarray is a view, base attribute returns the original numpy.ndarray.

Take reshape(), which returns a view as much as possible, as an example.

a = np.arange(10)
print(a)
# [0 1 2 3 4 5 6 7 8 9]

a_0 = a[:6]
print(a_0)
# [0 1 2 3 4 5]

a_1 = a_0.reshape(2, 3)
print(a_1)
# [[0 1 2]
#  [3 4 5]]

print(a_0.base)
# [0 1 2 3 4 5 6 7 8 9]

print(a_1.base)
# [0 1 2 3 4 5 6 7 8 9]

The base attribute of the copy or the original numpy.ndarray (a newly created numpy.ndarray that is neither a copy nor a view) is None.

a_copy = a.copy()
print(a_copy)
# [0 1 2 3 4 5 6 7 8 9]

print(a_copy.base)
# None

print(a.base)
# None

You can use the is operator to compare base attribute with None to determine if it is a view or not.

print(a_0.base is None)
# False

print(a_copy.base is None)
# True

print(a.base is None)
# True

You can also see that they share memory by comparing the view's base attribute with the original numpy.ndarray, or by comparing the view's base attributes with each other.

print(a_0.base is a)
# True

print(a_0.base is a_1.base)
# True

The following np.shares_memory() is more convenient for determining whether memory is shared.

Sponsored Link

Determine if memory is shared: np.shares_memory()

Whether the two arrays share memory can be determined by np.shares_memory().

Basic usage

Specify two numpy.ndarrays in np.shares_memory(). True is returned if those arrays share memory.

a = np.arange(6)
print(a)
# [0 1 2 3 4 5]

a_reshape = a.reshape(2, 3)
print(a_reshape)
# [[0 1 2]
#  [3 4 5]]

print(np.shares_memory(a, a_reshape))
# True

If you specify two views generated from the common numpy.ndarray, True is also returned.

a_slice = a[2:5]
print(a_slice)
# [2 3 4]

print(np.shares_memory(a_reshape, a_slice))
# True

In case of a copy, False is returned.

a_reshape_copy = a.reshape(2, 3).copy()
print(a_reshape_copy)
# [[0 1 2]
#  [3 4 5]]

print(np.shares_memory(a, a_reshape_copy))
# False

np.may_share_memory()

There is also a function similar to np.shares_memory() called np.may_share_memory().

np.may_share_memory() is less strict than np.shares_memory(), as you can see from the fact that the function name contains may.

np.may_share_memory() only determines if the memory address ranges overlap, it does not consider if there are elements that reference the same memory.

For example, in the following case, the two slices are the same view of numpy.ndarray and refer to the overlapping range, but each element itself refers to a different memory.

a = np.arange(10)
print(a)
# [0 1 2 3 4 5 6 7 8 9]

a_0 = a[::2]
print(a_0)
# [0 2 4 6 8]

a_1 = a[1::2]
print(a_1)
# [1 3 5 7 9]

np.shares_memory() returns False for strict judgment, but np.may_share_memory() returns True.

print(np.shares_memory(a_0, a_1))
# False

print(np.may_share_memory(a_0, a_1))
# True

In the example below, np.may_share_memory() returns False because the two slices are the first half and the second half of the original numpy.ndarray and the ranges do not overlap.

a_2 = a[:5]
print(a_2)
# [0 1 2 3 4]

a_3 = a[5:]
print(a_3)
# [5 6 7 8 9]

print(np.shares_memory(a_2, a_3))
# False

print(np.may_share_memory(a_2, a_3))
# False

The processing time is longer for np.shares_memory(), which makes a strict judgment. Note that the code below uses the Jupyter Notebook magic command %%timeit and is not measured when executed as a Python script.

%%timeit
np.shares_memory(a_0, a_1)
# 839 ns ± 53.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%%timeit
np.may_share_memory(a_0, a_1)
# 275 ns ± 5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Although np.may_share_memory() may return True by mistake when each element does not share memory, it does not return False by mistake when memory is shared.

Sponsored Link
Share

Related Categories

Related Articles