NumPy: astype() to change dtype of an array

Modified: 2024-02-04 | Tags: Python, NumPy

NumPy arrays (ndarray) hold a data type (dtype). You can set this through various operations, such as when creating an ndarray with np.array(), or change it later with astype().

Contents

Main data types (dtype) in NumPy
Range of numeric types (minimum and maximum values)
- np.iinfo()
- np.finfo()
The number of characters in strings
object stores pointers to Python objects
Change dtype with astype()
- Basic usage of astype()
Conversion from float to int truncate the decimal part
Implicit type conversions

Essentially, each ndarray is assigned a single dtype, ensuring all elements share the same data type.

While NumPy provides a mechanism for handling multiple data types within a single ndarray, known as "Structured Arrays", this article does not cover this topic. For processing datasets that contain multiple types (for example, columns of both numbers and strings), using pandas is often more convenient.

Refer to the following article for dtype and astype() in pandas.

pandas: How to use astype() to cast dtype of DataFrame

The NumPy version used in this article is as follows. Note that functionality may vary between versions.

import numpy as np

print(np.__version__)
# 1.26.1

source: numpy_dtype.py

Main data types (`dtype`) in NumPy

The main data types (dtype) in NumPy are as follows. The range of values that each type of integers and floating-point numbers can take will be discussed later.

`dtype`	Type code	Description
`int8`	`i1`	8-bit signed integer
`int16`	`i2`	16-bit signed integer
`int32`	`i4`	32-bit signed integer
`int64`	`i8`	64-bit signed integer
`uint8`	`u1`	8-bit unsigned integer
`uint16`	`u2`	16-bit unsigned integer
`uint32`	`u4`	32-bit unsigned integer
`uint64`	`u8`	64-bit unsigned integer
`float16`	`f2`	Half precision floating-point (1 bit for sign, 5 bits for exponent, 10 bits for mantissa)
`float32`	`f4`	Single precision floating-point (1 bit for sign, 8 bits for exponent, 23 bits for mantissa)
`float64`	`f8`	Double precision floating-point (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)
`float128`	`f16`	Quadruple precision floating-point type (1 bit for sign, 15 bits for exponent, 112 bits for mantissa)
`complex64`	`c8`	Complex number (real and imaginary parts are `float32`)
`complex128`	`c16`	Complex number (real and imaginary parts are `float64`)
`complex256`	`c32`	Complex number (real and imaginary parts are `float128`)
`bool`	`?`	Boolean (`True` or `False`)
`unicode`	`U`	Unicode string
`object`	`O`	Python object

The numeric suffix in a dtype name indicates the number of bits, whereas the numeric suffix in a type code indicates the number of bytes. Note that the numbers are different even for the same type.

The type code ? for bool does not mean unknown but is literally assigned ?.

When specifying dtype in functions or methods, for example, int64 can be specified in the following three ways:

Type object: np.int64
Type name as a string: 'int64'
Type code as a string: 'i8'

a = np.array([1, 2, 3], dtype=np.int64)
print(a.dtype)
# int64

a = np.array([1, 2, 3], dtype='int64')
print(a.dtype)
# int64

a = np.array([1, 2, 3], dtype='i8')
print(a.dtype)
# int64

source: numpy_dtype.py

Note that when bool, unicode, or object is specified as a type object, it must be suffixed with an underscore _, as in np.bool_, np.unicode_, or np.object_.

Python types such as int, float, and str can also be specified. In this case, they are treated as the equivalent dtype. Examples in Python 3, 64-bit environment are as follows. uint, which is not a standard Python type, is included for convenience.

Python type	Equivalent `dtype` example
`int`	`int64`
`float`	`float64`
`str`	`unicode`
(`uint`)	`uint64`

When specifying as an argument, strings 'int' or 'float' can be used for int or float. The non-Python type uint must be specified using the string 'uint'.

a = np.array([1, 2, 3], dtype=int)
print(a.dtype)
# int64

a = np.array([1, 2, 3], dtype='int')
print(a.dtype)
# int64

source: numpy_dtype.py

Range of numeric types (minimum and maximum values)

The range of values for integer (int, uint) and floating-point number (float) can be checked with np.iinfo() and np.finfo().

`np.iinfo()`

Use np.iinfo() for integers (int, uint).

numpy.iinfo — NumPy v1.26 Manual

Specifying a data type as an argument returns a numpy.iinfo object, which can be inspected using print() to see an overview, or its max and min attributes can be accessed to obtain the maximum and minimum values as numbers.

The following example uses np.int64, but strings such as 'int64' or 'i8' can also be used.

ii = np.iinfo(np.int64)
print(type(ii))
# <class 'numpy.iinfo'>

print(ii)
# Machine parameters for int64
# ---------------------------------------------------------------
# min = -9223372036854775808
# max = 9223372036854775807
# ---------------------------------------------------------------
# 

print(ii.max)
# 9223372036854775807

print(type(ii.max))
# <class 'int'>

print(ii.min)
# -9223372036854775808

print(ii.bits)
# 64

source: numpy_iinfo_finfo.py

You can also specify the value directly as an argument.

i = 100
print(type(i))
# <class 'int'>

print(np.iinfo(i))
# Machine parameters for int64
# ---------------------------------------------------------------
# min = -9223372036854775808
# max = 9223372036854775807
# ---------------------------------------------------------------
# 

ui = np.uint8(100)
print(type(ui))
# <class 'numpy.uint8'>

print(np.iinfo(ui))
# Machine parameters for uint8
# ---------------------------------------------------------------
# min = 0
# max = 255
# ---------------------------------------------------------------
#

source: numpy_iinfo_finfo.py

NumPy arrays (ndarray) cannot be specified. You need to specify either the data type of the array or provide a specific value instead.

a = np.array([1, 2, 3], dtype=np.int8)
print(type(a))
# <class 'numpy.ndarray'>

# print(np.iinfo(a))
# ValueError: Invalid integer data type 'O'.

print(np.iinfo(a.dtype))
# Machine parameters for int8
# ---------------------------------------------------------------
# min = -128
# max = 127
# ---------------------------------------------------------------
# 

print(np.iinfo(a[0]))
# Machine parameters for int8
# ---------------------------------------------------------------
# min = -128
# max = 127
# ---------------------------------------------------------------
#

source: numpy_iinfo_finfo.py

`np.finfo()`

Use np.finfo() for floating-point numbers (float).

numpy.finfo — NumPy v1.26 Manual

The usage is the same as np.iinfo(). The arguments can be a type object (e.g., np.float64), a string ('float64', 'f8'), or a value (0.1).

You can output an overview with print() or obtain the values of various attributes as numbers.

fi = np.finfo(np.float64)
print(type(fi))
# <class 'numpy.finfo'>

print(fi)
# Machine parameters for float64
# ---------------------------------------------------------------
# precision =  15   resolution = 1.0000000000000001e-15
# machep =    -52   eps =        2.2204460492503131e-16
# negep =     -53   epsneg =     1.1102230246251565e-16
# minexp =  -1022   tiny =       2.2250738585072014e-308
# maxexp =   1024   max =        1.7976931348623157e+308
# nexp =       11   min =        -max
# smallest_normal = 2.2250738585072014e-308   smallest_subnormal = 4.9406564584124654e-324
# ---------------------------------------------------------------
# 

print(fi.max)
# 1.7976931348623157e+308

print(type(fi.max))
# <class 'numpy.float64'>

print(fi.min)
# -1.7976931348623157e+308

print(fi.eps)
# 2.220446049250313e-16

print(fi.bits)
# 64

print(fi.iexp)
# 11

print(fi.nmant)
# 52

source: numpy_iinfo_finfo.py

np.finfo() provides more information than np.iinfo(), such as eps for epsilon, iexp and nmant for the number of bits in the exponent and mantissa. For details, refer to the official documentation above.

The number of characters in strings

When holding elements as strings, the dtype will be formatted as <U3, for example.

a_str = np.array([1, 22, 333], dtype=str)
print(a_str)
# ['1' '22' '333']

print(a_str.dtype)
# <U3

source: numpy_dtype.py

The leading < and > represent little-endian and big-endian, respectively.

The trailing number signifies the maximum number of characters that can be stored. When dtype is specified as str in np.array(), as shown in this example, this number would be set based on the longest element.

Only enough memory for this number of characters is allocated for each element, meaning that strings longer than this number of characters cannot be accommodated and will be truncated. Therefore, it is necessary to specify a data type with a sufficient character length in advance.

a_str[0] = 'abcde'
print(a_str)
# ['abc' '22' '333']

a_str10 = np.array([1, 22, 333], dtype='U10')
print(a_str10.dtype)
# <U10

a_str10[0] = 'abcde'
print(a_str10)
# ['abcde' '22' '333']

source: numpy_dtype.py

`object` stores pointers to Python objects

The object type is a special data type that stores pointers to Python objects.

This means that the actual data for each element is stored in a separate memory space, enabling an ndarray to hold pointers to data of different types.

a_object = np.array([1, 0.1, 'abc'], dtype=object)
print(a_object)
# [1 0.1 'abc']

print(a_object.dtype)
# object

print(type(a_object[0]))
print(type(a_object[1]))
print(type(a_object[2]))
# <class 'int'>
# <class 'float'>
# <class 'str'>

source: numpy_dtype.py

When the dtype is set to object, you can freely increase the number of characters in a string.

a_object[2] = 'abcXYZ'
print(a_object)
# [1 0.1 'abcXYZ']

source: numpy_dtype.py

Arrays containing elements of different types can also be represented using Python's built-in list type.

list and ndarray behave differently with operators. While ndarray supports element-wise operations, creating and processing such data in NumPy might be less common, considering the versatility of list for handling mixed types.

l = [1, 0.1, 'abcXYZ']

print(type(l))
# <class 'list'>

print(type(l[0]))
print(type(l[1]))
print(type(l[2]))
# <class 'int'>
# <class 'float'>
# <class 'str'>

print(a_object * 2)
# [2 0.2 'abcXYZabcXYZ']

print(l * 2)
# [1, 0.1, 'abcXYZ', 1, 0.1, 'abcXYZ']

source: numpy_dtype.py

Change `dtype` with `astype()`

Basic usage of `astype()`

The astype() method of ndarray allows for changing (casting) dtype.

numpy.ndarray.astype — NumPy v1.26 Manual

A new ndarray with a changed dtype is generated, and the original ndarray remains unchanged.

a = np.array([1, 2, 3])
print(a)
print(a.dtype)
# [1 2 3]
# int64

a_float = a.astype(np.float32)
print(a_float)
print(a_float.dtype)
# [1. 2. 3.]
# float32

print(a)
print(a.dtype)
# [1 2 3]
# int64

source: numpy_astype.py

As mentioned above, dtype can also be specified as a type name string, type code string, or Python type.

a_int = a.astype('int32')
print(a_int)
print(a_int.dtype)
# [1 2 3]
# int32

a_uint = a.astype('u8')
print(a_uint)
print(a_uint.dtype)
# [1 2 3]
# uint64

a_float = a.astype(float)
print(a_float)
print(a_float.dtype)
# [1. 2. 3.]
# float64

source: numpy_astype.py

Conversion from `float` to `int` truncate the decimal part

Converting from floating-point numbers (float) to integers (int) truncates the decimal part (rounding towards 0).

a = np.array([-2, -1.5, -1, -0.5, 0.5, 1, 1.5, 2])

print(a.astype(int))
# [-2 -1 -1  0  0  1  1  2]

source: numpy_astype.py

Refer to the following articles for rounding, truncating, and ceiling in NumPy.

Implicit type conversions

In addition to explicit type conversion with astype(), implicit type conversion can occur during operations.

For example, division with the / operator returns float even between integers.

a_int = np.array([1, 2, 3])
a_float = np.array([1.0, 2.0, 3.0])

print((a_int / a_int).dtype)
# float64

print((a_int / a_float).dtype)
# float64

source: numpy_implicit_type_conversion.py

For +, -, *, //, and ** operations, if both operands are integers, the result is int; if at least one operand is a floating-point number, the result is float.

print((a_int + a_int).dtype)
# int64

print((a_int + a_float).dtype)
# float64

print((a_int - a_int).dtype)
# int64

print((a_int - a_float).dtype)
# float64

print((a_int * a_int).dtype)
# int64

print((a_int * a_float).dtype)
# float64

print((a_int // a_int).dtype)
# int64

print((a_int // a_float).dtype)
# float64

print((a_int**a_int).dtype)
# int64

print((a_int**a_float).dtype)
# float64

source: numpy_implicit_type_conversion.py

Even in operations between integers or floating-point numbers, if their bit sizes differ, the result is converted to the type with the larger bit size.

a_int16 = np.array([1, 2, 3], np.int16)
a_int32 = np.array([1, 2, 3], np.int32)

print((a_int16 + a_int32).dtype)
# int32

a_float16 = np.array([1, 2, 3], np.float16)
a_float32 = np.array([1, 2, 3], np.float32)

print((a_float16 + a_float32).dtype)
# float32

source: numpy_implicit_type_conversion.py

However, in some cases, the result might differ in type from the original arrays. For processes where bit size is crucial, it is safer to explicitly convert to the desired type with astype() beforehand.

print((a_int16 + a_float16).dtype)
# float32

print((a_int32 + a_float32).dtype)
# float64

source: numpy_implicit_type_conversion.py

Note that when assigning values to elements, the dtype does not change.

For example, if a floating-point number is assigned to an array of int, the ndarray data type remains int. The assigned value is truncated, removing the decimal part, effectively rounding it towards 0.

a_int[0] = 10.9
a_int[1] = -20.9
print(a_int)
# [ 10 -20   3]

print(a_int.dtype)
# int64

source: numpy_implicit_type_conversion.py

NumPy: astype() to change dtype of an array

Main data types (`dtype`) in NumPy

Range of numeric types (minimum and maximum values)

`np.iinfo()`

`np.finfo()`

The number of characters in strings

`object` stores pointers to Python objects

Change `dtype` with `astype()`

Basic usage of `astype()`

Conversion from `float` to `int` truncate the decimal part

Implicit type conversions

Related Categories

Related Articles

NumPy: astype() to change dtype of an array

Main data types (dtype) in NumPy

Range of numeric types (minimum and maximum values)

np.iinfo()

np.finfo()

The number of characters in strings

object stores pointers to Python objects

Change dtype with astype()

Basic usage of astype()

Conversion from float to int truncate the decimal part

Implicit type conversions

Related Categories

Related Articles

Main data types (`dtype`) in NumPy

`np.iinfo()`

`np.finfo()`

`object` stores pointers to Python objects

Change `dtype` with `astype()`

Basic usage of `astype()`

Conversion from `float` to `int` truncate the decimal part