# NumPy: Cast ndarray to a specific dtype with astype()

Posted: 2021-10-11 / Tags: Python, NumPy

NumPy array `ndarray` has a data type `dtype`, which can be specified when creating an `ndarray` object with `np.array()`. You can also convert it to another type with the `astype()` method.

Basically, one `dtype` is set for one `ndarray` object, and all elements are of the same data type.

• List of basic data types (`dtype`) in NumPy
• Range of values (minimum and maximum values) for numeric types
• `np.iinfo()`
• `np.finfo()`
• The number of characters in a string
• `object`: Stores pointers to Python objects
• Casting data type (`dtype`) with `astype()`
• Rounding when casting from `float` to `int`
• Implicit type conversions

## List of basic data types (`dtype`) in NumPy

The following is a list of basic data types `dtype` in NumPy. The range of values (= minimum and maximum values) that can be taken by each type of integer and floating point number is described later.

`dtype` character code description
`int8` `i1` 8-bit signed integer
`int16` `i2` 16-bit signed integer
`int32` `i4` 32-bit signed integer
`int64` `i8` 64-bit signed integer
`uint8` `u1` 8-bit unsigned integer
`uint16` `u2` 16-bit unsigned integer
`uint32` `u4` 32-bit unsigned integer
`uint64` `u8` 64-bit unsigned integer
`float16` `f2` 16-bit floating-point number
`float32` `f4` 32-bit floating-point number
`float64` `f8` 64-bit floating-point number
`float128` `f16` 128-bit floating-point number
`complex64` `c8` 64-bit complex floating-point number
`complex128` `c16` 128-bit complex floating-point number
`complex256` `c32` 256-bit complex floating-point number
`bool` `?` Boolean (`True` or `False`)
`unicode` `U` Unicode string
`object` `O` Python objects

The numbers of `dtype` is in `bit`, and the numbers of character code is in `byte`. Note that the numbers are different even for the same type.

The character code for the `bool` type, `?` does not mean unknown, but literally `?` is assigned.

When the data type `dtype` is specified as an argument of various methods and functions, for example, you can use any of the following for `int64`:

• `np.int64`
• `'int64'`
• `'i8'`
``````import numpy as np

a = np.array([1, 2, 3], dtype=np.int64)
print(a.dtype)
# int64

a = np.array([1, 2, 3], dtype='int64')
print(a.dtype)
# int64

a = np.array([1, 2, 3], dtype='i8')
print(a.dtype)
# int64
``````

It can also be specified as a Python built-in type such as `int`, `float`, or `str`.

In this case, it is automatically assumed to be an equivalent `dtype`, but which `dtype` it is converted to depends on the environment.

The following is an example in Python 3, 64-bit environment. There is no Python type called `uint`, but list it together for convenience.

Python type Example of equivalent `dtype`
`int` `int64`
`float` `float64`
`str` `unicode`
(`uint`) `uint64`

Both `int` and the string `'int'` are allowed as arguments; only the string `'uint'` is allowed for `uint`, which is not a Python type.

``````print(int is np.int)
# True

a = np.array([1, 2, 3], dtype=int)
print(a.dtype)
# int64

a = np.array([1, 2, 3], dtype='int')
print(a.dtype)
# int64
``````

## Range of values (minimum and maximum values) for numeric types

You can use `np.iinfo()` and `np.fininfo()` to check the range of possible values for each data type of integer `int`, `uint` and floating-point number `float`.

### `np.iinfo()`

Use `np.iinfo()` for integers `int` and `uint`.

The type `numpy.iinfo` is returned by specifying a type object as an argument.

You can use `print()` to print out a summary, and `max` and `min` attributes to get the maximum and minimum values.

``````ii64 = np.iinfo(np.int64)
print(type(ii64))
# <class 'numpy.iinfo'>

print(ii64)
# Machine parameters for int64
# ---------------------------------------------------------------
# min = -9223372036854775808
# max = 9223372036854775807
# ---------------------------------------------------------------
#

print(ii64.max)
# 9223372036854775807

print(type(ii64.max))
# <class 'int'>

print(ii64.min)
# -9223372036854775808

print(ii64.bits)
# 64
``````

You can also specify a string representing the `dtype` as an argument.

``````print(np.iinfo('int16'))
# Machine parameters for int16
# ---------------------------------------------------------------
# min = -32768
# max = 32767
# ---------------------------------------------------------------
#

print(np.iinfo('i4'))
# Machine parameters for int32
# ---------------------------------------------------------------
# min = -2147483648
# max = 2147483647
# ---------------------------------------------------------------
#

print(np.iinfo(int))
# Machine parameters for int64
# ---------------------------------------------------------------
# min = -9223372036854775808
# max = 9223372036854775807
# ---------------------------------------------------------------
#

print(np.iinfo('uint64'))
# Machine parameters for uint64
# ---------------------------------------------------------------
# min = 0
# max = 18446744073709551615
# ---------------------------------------------------------------
#
``````

The value itself can also be specified as an argument.

``````i = 100
print(type(i))
# <class 'int'>

print(np.iinfo(i))
# Machine parameters for int64
# ---------------------------------------------------------------
# min = -9223372036854775808
# max = 9223372036854775807
# ---------------------------------------------------------------
#

ui = np.uint8(100)
print(type(ui))
# <class 'numpy.uint8'>

print(np.iinfo(ui))
# Machine parameters for uint8
# ---------------------------------------------------------------
# min = 0
# max = 255
# ---------------------------------------------------------------
#
``````

NumPy array `ndarray` is not allowed. Get the data type with the `dtype` attribute, or get an element and specify it.

``````a = np.array([1, 2, 3], dtype=np.int8)
print(type(a))
# <class 'numpy.ndarray'>

# print(np.iinfo(a))
# ValueError: Invalid integer data type 'O'.

print(np.iinfo(a.dtype))
# Machine parameters for int8
# ---------------------------------------------------------------
# min = -128
# max = 127
# ---------------------------------------------------------------
#

print(np.iinfo(a[0]))
# Machine parameters for int8
# ---------------------------------------------------------------
# min = -128
# max = 127
# ---------------------------------------------------------------
#
``````

### `np.finfo()`

Use `np.fininfo()` for floating point numbers `float`.

Usage is the same as `np.iinfo()`.

The argument can be a type object (`np.float64`), a string (`'float64'`, `'f8'`) or a value (`0.1`).

``````fi64 = np.finfo(np.float64)
print(type(fi64))
# <class 'numpy.finfo'>

print(fi64)
# Machine parameters for float64
# ---------------------------------------------------------------
# precision =  15   resolution = 1.0000000000000001e-15
# machep =    -52   eps =        2.2204460492503131e-16
# negep =     -53   epsneg =     1.1102230246251565e-16
# minexp =  -1022   tiny =       2.2250738585072014e-308
# maxexp =   1024   max =        1.7976931348623157e+308
# nexp =       11   min =        -max
# ---------------------------------------------------------------
#

print(fi64.max)
# 1.7976931348623157e+308

print(type(fi64.max))
# <class 'numpy.float64'>

print(fi64.min)
# -1.7976931348623157e+308

print(fi64.eps)
# 2.220446049250313e-16

print(fi64.bits)
# 64

print(fi64.iexp)
# 11

print(fi64.nmant)
# 52
``````

As shown in the example above, you can get epsilon with `eps`, number of bits in exponential and mantissa parts with `iexp` and `nmant`, and so on.

See the official documentation above for details.

## The number of characters in a string

If you use `str` or `unicode`, `dtype` is like `<U1`.

``````a_str = np.array([1, 2, 3], dtype=str)
print(a_str)
print(a_str.dtype)
# ['1' '2' '3']
# <U1
``````

`<` and `>` indicates little-endian and big-endian, respectively.

The number at the end indicates the number of characters. It is the maximum number of characters among all elements if `dtype` is specified as `str` or `unicode` in `np.array()`, as in this example.

Since only this number of characters is allocated for each element, strings with more than this number of characters are truncated.

You can specify a type with a sufficient number of characters beforehand.

``````a_str[0] = 'abcde'
print(a_str)
# ['a' '2' '3']

a_str10 = np.array([1, 2, 3], dtype='U10')
print(a_str10.dtype)
# <U10

a_str10[0] = 'abcde'
print(a_str10)
# ['abcde' '2' '3']
``````

## `object`: Stores pointers to Python objects

The `object` type is a special data type that stores pointers to Python objects.

Since each data entity of each element allocates its own memory area, it is possible to have (pointers to) data of multiple types in a single array.

``````a_object = np.array([1, 0.1, 'one'], dtype=object)
print(a_object)
print(a_object.dtype)
# [1 0.1 'one']
# object

print(type(a_object[0]))
print(type(a_object[1]))
print(type(a_object[2]))
# <class 'int'>
# <class 'float'>
# <class 'str'>
``````

You can also change the number of characters.

``````a_object[2] = 'oneONE'
print(a_object)
# [1 0.1 'oneONE']
``````

Note that such arrays with multiple types can also be realized with Python built-in `list` type.

`list` and `numpy.ndarray` have different behaviors for operators. In the case of `ndarray`, it is easy to perform operations on each element.

``````l = [1, 0.1, 'oneONE']
print(type(l[0]))
print(type(l[1]))
print(type(l[2]))
# <class 'int'>
# <class 'float'>
# <class 'str'>

print(a_object * 2)
# [2 0.2 'oneONEoneONE']

print(l * 2)
# [1, 0.1, 'oneONE', 1, 0.1, 'oneONE']
``````

## Casting data type (`dtype`) with `astype()`

The `astype()` method of `numpy.ndarray` can convert the data type `dtype`.

A new `ndarray` is created with new `dtype`, and the original `ndarray` is not be changed.

``````import numpy as np

a = np.array([1, 2, 3])
print(a)
print(a.dtype)
# [1 2 3]
# int64

a_float = a.astype(np.float32)
print(a_float)
print(a_float.dtype)
# [1. 2. 3.]
# float32

print(a)
print(a.dtype)
# [1 2 3]
# int64
``````

As mentioned above, `dtype` can be specified in various ways.

``````a_float = a.astype(float)
print(a_float)
print(a_float.dtype)
# [1. 2. 3.]
# float64

a_str = a.astype('str')
print(a_str)
print(a_str.dtype)
# ['1' '2' '3']
# <U21

a_int = a.astype('int32')
print(a_int)
print(a_int.dtype)
# [1 2 3]
# int32
``````

## Rounding when casting from `float` to `int`

When casting from `float` to `int`, the decimal point is truncated and rounded towards `0`.

``````a = np.arange(50).reshape((5, 10)) / 10 - 2
print(a)
print(a.dtype)
# [[-2.  -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1]
#  [-1.  -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1]
#  [ 0.   0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9]
#  [ 1.   1.1  1.2  1.3  1.4  1.5  1.6  1.7  1.8  1.9]
#  [ 2.   2.1  2.2  2.3  2.4  2.5  2.6  2.7  2.8  2.9]]
# float64

a_int = a.astype('int64')
print(a_int)
print(a_int.dtype)
# [[-2 -1 -1 -1 -1 -1 -1 -1 -1 -1]
#  [-1  0  0  0  0  0  0  0  0  0]
#  [ 0  0  0  0  0  0  0  0  0  0]
#  [ 1  1  1  1  1  1  1  1  1  1]
#  [ 2  2  2  2  2  2  2  2  2  2]]
# int64
``````

`np.round()` and `np.around()` rounds to the nearest even value. `0.5` may be rounded to `0` instead of `1`.

``````print(np.round(a).astype(int))
# [[-2 -2 -2 -2 -2 -2 -1 -1 -1 -1]
#  [-1 -1 -1 -1 -1  0  0  0  0  0]
#  [ 0  0  0  0  0  0  1  1  1  1]
#  [ 1  1  1  1  1  2  2  2  2  2]
#  [ 2  2  2  2  2  2  3  3  3  3]]
``````

If you define the following function, `0.5` is rounded to `1`.

``````my_round_int = lambda x: np.round((x * 2 + 1) // 2)

print(my_round_int(a).astype(int))
# [[-2 -2 -2 -2 -2 -1 -1 -1 -1 -1]
#  [-1 -1 -1 -1 -1  0  0  0  0  0]
#  [ 0  0  0  0  0  1  1  1  1  1]
#  [ 1  1  1  1  1  2  2  2  2  2]
#  [ 2  2  2  2  2  3  3  3  3  3]]
``````

The function above rounds `-0.5` to `0`. If you want to round `-0.5` to `-1`, the function should be as follows.

``````def my_round(x, digit=0):
p = 10 ** digit
s = np.copysign(1, x)
return (s * x * p * 2 + 1) // 2 / p * s

print(my_round(a).astype(int))
# [[-2 -2 -2 -2 -2 -2 -1 -1 -1 -1]
#  [-1 -1 -1 -1 -1 -1  0  0  0  0]
#  [ 0  0  0  0  0  1  1  1  1  1]
#  [ 1  1  1  1  1  2  2  2  2  2]
#  [ 2  2  2  2  2  3  3  3  3  3]]
``````

## Implicit type conversions

In addition to explicit type conversion by `astype()`, implicit type conversion may be performed by some operations.

For example, division by the `/` operator returns a floating-point number `float`.

``````a = np.array([1, 2, 3])
print(a)
print(a.dtype)
# [1 2 3]
# int64

print((a / 1).dtype)
# float64

print((a / 1.0).dtype)
# float64
``````

For `+`, `-`, `*`, `//`, and `**`, the result is `int` if all of them are between `int`, and `float` if they contain `float`.

``````print((a + 1).dtype)
# int64

print((a + 1.0).dtype)
# float64

print((a - 1).dtype)
# int64

print((a - 1.0).dtype)
# float64

print((a * 1).dtype)
# int64

print((a * 1.0).dtype)
# float64

print((a // 1).dtype)
# int64

print((a // 1.0).dtype)
# float64

print((a ** 1).dtype)
# int64

print((a ** 1.0).dtype)
# float64
``````

The same is true for operations between `numpy.ndarray`.

Also, even between `int`, if the number of bits is different, the type is converted.

``````ones_int16 = np.ones(3, np.int16)
print(ones_int16)
# [1 1 1]

ones_int32 = np.ones(3, np.int32)
print(ones_int32)
# [1 1 1]

print((ones_int16 + ones_int32).dtype)
# int32
``````

As in this example, you can assume that the data type is basically converted to the one with the larger amount of data.

However, in some cases, the type may be different from any of the original `numpy.ndarray`. If the number of bits is important, it is better to convert it to the desired type explicitly with `astype()`.

``````ones_float16 = np.ones(3, np.float16)
print(ones_float16)
# [1. 1. 1.]

print((ones_int16 + ones_float16).dtype)
# float32
``````

Note that the type of `numpy.ndarray` is not converted when assigning a value to an element.

For example, if you assign a `float` value to an integer `numpy.ndarray`, the data type of the `numpy.ndarray` is still `int`. The assigned value is truncated after the decimal point.

``````ones_int16[0] = 10.9
print(ones_int16)
# [10  1  1]

print(ones_int16.dtype)
# int16
``````