note.nkmk.me

NumPy: Cast ndarray to a specific dtype with astype()

Posted: 2021-10-11 / Tags: Python, NumPy

NumPy array ndarray has a data type dtype, which can be specified when creating an ndarray object with np.array(). You can also convert it to another type with the astype() method.

Basically, one dtype is set for one ndarray object, and all elements are of the same data type.

This article describes the following contents.

  • List of basic data types (dtype) in NumPy
  • Range of values (minimum and maximum values) for numeric types
    • np.iinfo()
    • np.finfo()
  • The number of characters in a string
  • object: Stores pointers to Python objects
  • Casting data type (dtype) with astype()
  • Rounding when casting from float to int
  • Implicit type conversions
Sponsored Link

List of basic data types (dtype) in NumPy

The following is a list of basic data types dtype in NumPy. The range of values (= minimum and maximum values) that can be taken by each type of integer and floating point number is described later.

dtype character code description
int8 i1 8-bit signed integer
int16 i2 16-bit signed integer
int32 i4 32-bit signed integer
int64 i8 64-bit signed integer
uint8 u1 8-bit unsigned integer
uint16 u2 16-bit unsigned integer
uint32 u4 32-bit unsigned integer
uint64 u8 64-bit unsigned integer
float16 f2 16-bit floating-point number
float32 f4 32-bit floating-point number
float64 f8 64-bit floating-point number
float128 f16 128-bit floating-point number
complex64 c8 64-bit complex floating-point number
complex128 c16 128-bit complex floating-point number
complex256 c32 256-bit complex floating-point number
bool ? Boolean (True or False)
unicode U Unicode string
object O Python objects

The numbers of dtype is in bit, and the numbers of character code is in byte. Note that the numbers are different even for the same type.

The character code for the bool type, ? does not mean unknown, but literally ? is assigned.

When the data type dtype is specified as an argument of various methods and functions, for example, you can use any of the following for int64:

  • np.int64
  • 'int64'
  • 'i8'
import numpy as np

a = np.array([1, 2, 3], dtype=np.int64)
print(a.dtype)
# int64

a = np.array([1, 2, 3], dtype='int64')
print(a.dtype)
# int64

a = np.array([1, 2, 3], dtype='i8')
print(a.dtype)
# int64

It can also be specified as a Python built-in type such as int, float, or str.

In this case, it is automatically assumed to be an equivalent dtype, but which dtype it is converted to depends on the environment.

The following is an example in Python 3, 64-bit environment. There is no Python type called uint, but list it together for convenience.

Python type Example of equivalent dtype
int int64
float float64
str unicode
(uint) uint64

Both int and the string 'int' are allowed as arguments; only the string 'uint' is allowed for uint, which is not a Python type.

print(int is np.int)
# True

a = np.array([1, 2, 3], dtype=int)
print(a.dtype)
# int64

a = np.array([1, 2, 3], dtype='int')
print(a.dtype)
# int64

Range of values (minimum and maximum values) for numeric types

You can use np.iinfo() and np.fininfo() to check the range of possible values for each data type of integer int, uint and floating-point number float.

np.iinfo()

Use np.iinfo() for integers int and uint.

The type numpy.iinfo is returned by specifying a type object as an argument.

You can use print() to print out a summary, and max and min attributes to get the maximum and minimum values.

ii64 = np.iinfo(np.int64)
print(type(ii64))
# <class 'numpy.iinfo'>

print(ii64)
# Machine parameters for int64
# ---------------------------------------------------------------
# min = -9223372036854775808
# max = 9223372036854775807
# ---------------------------------------------------------------
# 

print(ii64.max)
# 9223372036854775807

print(type(ii64.max))
# <class 'int'>

print(ii64.min)
# -9223372036854775808

print(ii64.bits)
# 64

You can also specify a string representing the dtype as an argument.

print(np.iinfo('int16'))
# Machine parameters for int16
# ---------------------------------------------------------------
# min = -32768
# max = 32767
# ---------------------------------------------------------------
# 

print(np.iinfo('i4'))
# Machine parameters for int32
# ---------------------------------------------------------------
# min = -2147483648
# max = 2147483647
# ---------------------------------------------------------------
# 

print(np.iinfo(int))
# Machine parameters for int64
# ---------------------------------------------------------------
# min = -9223372036854775808
# max = 9223372036854775807
# ---------------------------------------------------------------
# 

print(np.iinfo('uint64'))
# Machine parameters for uint64
# ---------------------------------------------------------------
# min = 0
# max = 18446744073709551615
# ---------------------------------------------------------------
# 

The value itself can also be specified as an argument.

i = 100
print(type(i))
# <class 'int'>

print(np.iinfo(i))
# Machine parameters for int64
# ---------------------------------------------------------------
# min = -9223372036854775808
# max = 9223372036854775807
# ---------------------------------------------------------------
# 

ui = np.uint8(100)
print(type(ui))
# <class 'numpy.uint8'>

print(np.iinfo(ui))
# Machine parameters for uint8
# ---------------------------------------------------------------
# min = 0
# max = 255
# ---------------------------------------------------------------
# 

NumPy array ndarray is not allowed. Get the data type with the dtype attribute, or get an element and specify it.

a = np.array([1, 2, 3], dtype=np.int8)
print(type(a))
# <class 'numpy.ndarray'>

# print(np.iinfo(a))
# ValueError: Invalid integer data type 'O'.

print(np.iinfo(a.dtype))
# Machine parameters for int8
# ---------------------------------------------------------------
# min = -128
# max = 127
# ---------------------------------------------------------------
# 

print(np.iinfo(a[0]))
# Machine parameters for int8
# ---------------------------------------------------------------
# min = -128
# max = 127
# ---------------------------------------------------------------
# 

np.finfo()

Use np.fininfo() for floating point numbers float.

Usage is the same as np.iinfo().

The argument can be a type object (np.float64), a string ('float64', 'f8') or a value (0.1).

fi64 = np.finfo(np.float64)
print(type(fi64))
# <class 'numpy.finfo'>

print(fi64)
# Machine parameters for float64
# ---------------------------------------------------------------
# precision =  15   resolution = 1.0000000000000001e-15
# machep =    -52   eps =        2.2204460492503131e-16
# negep =     -53   epsneg =     1.1102230246251565e-16
# minexp =  -1022   tiny =       2.2250738585072014e-308
# maxexp =   1024   max =        1.7976931348623157e+308
# nexp =       11   min =        -max
# ---------------------------------------------------------------
# 

print(fi64.max)
# 1.7976931348623157e+308

print(type(fi64.max))
# <class 'numpy.float64'>

print(fi64.min)
# -1.7976931348623157e+308

print(fi64.eps)
# 2.220446049250313e-16

print(fi64.bits)
# 64

print(fi64.iexp)
# 11

print(fi64.nmant)
# 52

As shown in the example above, you can get epsilon with eps, number of bits in exponential and mantissa parts with iexp and nmant, and so on.

See the official documentation above for details.

The number of characters in a string

If you use str or unicode, dtype is like <U1.

a_str = np.array([1, 2, 3], dtype=str)
print(a_str)
print(a_str.dtype)
# ['1' '2' '3']
# <U1

< and > indicates little-endian and big-endian, respectively.

The number at the end indicates the number of characters. It is the maximum number of characters among all elements if dtype is specified as str or unicode in np.array(), as in this example.

Since only this number of characters is allocated for each element, strings with more than this number of characters are truncated.

You can specify a type with a sufficient number of characters beforehand.

a_str[0] = 'abcde'
print(a_str)
# ['a' '2' '3']

a_str10 = np.array([1, 2, 3], dtype='U10')
print(a_str10.dtype)
# <U10

a_str10[0] = 'abcde'
print(a_str10)
# ['abcde' '2' '3']

object: Stores pointers to Python objects

The object type is a special data type that stores pointers to Python objects.

Since each data entity of each element allocates its own memory area, it is possible to have (pointers to) data of multiple types in a single array.

a_object = np.array([1, 0.1, 'one'], dtype=object)
print(a_object)
print(a_object.dtype)
# [1 0.1 'one']
# object

print(type(a_object[0]))
print(type(a_object[1]))
print(type(a_object[2]))
# <class 'int'>
# <class 'float'>
# <class 'str'>

You can also change the number of characters.

a_object[2] = 'oneONE'
print(a_object)
# [1 0.1 'oneONE']

Note that such arrays with multiple types can also be realized with Python built-in list type.

list and numpy.ndarray have different behaviors for operators. In the case of ndarray, it is easy to perform operations on each element.

l = [1, 0.1, 'oneONE']
print(type(l[0]))
print(type(l[1]))
print(type(l[2]))
# <class 'int'>
# <class 'float'>
# <class 'str'>

print(a_object * 2)
# [2 0.2 'oneONEoneONE']

print(l * 2)
# [1, 0.1, 'oneONE', 1, 0.1, 'oneONE']
Sponsored Link

Casting data type (dtype) with astype()

The astype() method of numpy.ndarray can convert the data type dtype.

A new ndarray is created with new dtype, and the original ndarray is not be changed.

import numpy as np

a = np.array([1, 2, 3])
print(a)
print(a.dtype)
# [1 2 3]
# int64

a_float = a.astype(np.float32)
print(a_float)
print(a_float.dtype)
# [1. 2. 3.]
# float32

print(a)
print(a.dtype)
# [1 2 3]
# int64

As mentioned above, dtype can be specified in various ways.

a_float = a.astype(float)
print(a_float)
print(a_float.dtype)
# [1. 2. 3.]
# float64

a_str = a.astype('str')
print(a_str)
print(a_str.dtype)
# ['1' '2' '3']
# <U21

a_int = a.astype('int32')
print(a_int)
print(a_int.dtype)
# [1 2 3]
# int32

Rounding when casting from float to int

When casting from float to int, the decimal point is truncated and rounded towards 0.

a = np.arange(50).reshape((5, 10)) / 10 - 2
print(a)
print(a.dtype)
# [[-2.  -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1]
#  [-1.  -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1]
#  [ 0.   0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9]
#  [ 1.   1.1  1.2  1.3  1.4  1.5  1.6  1.7  1.8  1.9]
#  [ 2.   2.1  2.2  2.3  2.4  2.5  2.6  2.7  2.8  2.9]]
# float64

a_int = a.astype('int64')
print(a_int)
print(a_int.dtype)
# [[-2 -1 -1 -1 -1 -1 -1 -1 -1 -1]
#  [-1  0  0  0  0  0  0  0  0  0]
#  [ 0  0  0  0  0  0  0  0  0  0]
#  [ 1  1  1  1  1  1  1  1  1  1]
#  [ 2  2  2  2  2  2  2  2  2  2]]
# int64

np.round() and np.around() rounds to the nearest even value. 0.5 may be rounded to 0 instead of 1.

print(np.round(a).astype(int))
# [[-2 -2 -2 -2 -2 -2 -1 -1 -1 -1]
#  [-1 -1 -1 -1 -1  0  0  0  0  0]
#  [ 0  0  0  0  0  0  1  1  1  1]
#  [ 1  1  1  1  1  2  2  2  2  2]
#  [ 2  2  2  2  2  2  3  3  3  3]]

If you define the following function, 0.5 is rounded to 1.

my_round_int = lambda x: np.round((x * 2 + 1) // 2)

print(my_round_int(a).astype(int))
# [[-2 -2 -2 -2 -2 -1 -1 -1 -1 -1]
#  [-1 -1 -1 -1 -1  0  0  0  0  0]
#  [ 0  0  0  0  0  1  1  1  1  1]
#  [ 1  1  1  1  1  2  2  2  2  2]
#  [ 2  2  2  2  2  3  3  3  3  3]]

The function above rounds -0.5 to 0. If you want to round -0.5 to -1, the function should be as follows.

def my_round(x, digit=0):
    p = 10 ** digit
    s = np.copysign(1, x)
    return (s * x * p * 2 + 1) // 2 / p * s

print(my_round(a).astype(int))
# [[-2 -2 -2 -2 -2 -2 -1 -1 -1 -1]
#  [-1 -1 -1 -1 -1 -1  0  0  0  0]
#  [ 0  0  0  0  0  1  1  1  1  1]
#  [ 1  1  1  1  1  2  2  2  2  2]
#  [ 2  2  2  2  2  3  3  3  3  3]]

Implicit type conversions

In addition to explicit type conversion by astype(), implicit type conversion may be performed by some operations.

For example, division by the / operator returns a floating-point number float.

a = np.array([1, 2, 3])
print(a)
print(a.dtype)
# [1 2 3]
# int64

print((a / 1).dtype)
# float64

print((a / 1.0).dtype)
# float64

For +, -, *, //, and **, the result is int if all of them are between int, and float if they contain float.

print((a + 1).dtype)
# int64

print((a + 1.0).dtype)
# float64

print((a - 1).dtype)
# int64

print((a - 1.0).dtype)
# float64

print((a * 1).dtype)
# int64

print((a * 1.0).dtype)
# float64

print((a // 1).dtype)
# int64

print((a // 1.0).dtype)
# float64

print((a ** 1).dtype)
# int64

print((a ** 1.0).dtype)
# float64

The same is true for operations between numpy.ndarray.

Also, even between int, if the number of bits is different, the type is converted.

ones_int16 = np.ones(3, np.int16)
print(ones_int16)
# [1 1 1]

ones_int32 = np.ones(3, np.int32)
print(ones_int32)
# [1 1 1]

print((ones_int16 + ones_int32).dtype)
# int32

As in this example, you can assume that the data type is basically converted to the one with the larger amount of data.

However, in some cases, the type may be different from any of the original numpy.ndarray. If the number of bits is important, it is better to convert it to the desired type explicitly with astype().

ones_float16 = np.ones(3, np.float16)
print(ones_float16)
# [1. 1. 1.]

print((ones_int16 + ones_float16).dtype)
# float32

Note that the type of numpy.ndarray is not converted when assigning a value to an element.

For example, if you assign a float value to an integer numpy.ndarray, the data type of the numpy.ndarray is still int. The assigned value is truncated after the decimal point.

ones_int16[0] = 10.9
print(ones_int16)
# [10  1  1]

print(ones_int16.dtype)
# int16
Sponsored Link
Share

Related Categories

Related Articles