NumPy: astype() to change dtype of an array

Modified: | Tags: Python, NumPy

NumPy arrays (ndarray) hold a data type (dtype). You can set this through various operations, such as when creating an ndarray with np.array(), or change it later with astype().

Essentially, each ndarray is assigned a single dtype, ensuring all elements share the same data type.

While NumPy provides a mechanism for handling multiple data types within a single ndarray, known as "Structured Arrays", this article does not cover this topic. For processing datasets that contain multiple types (for example, columns of both numbers and strings), using pandas is often more convenient.

Refer to the following article for dtype and astype() in pandas.

The NumPy version used in this article is as follows. Note that functionality may vary between versions.

import numpy as np

print(np.__version__)
# 1.26.1

Main data types (dtype) in NumPy

The main data types (dtype) in NumPy are as follows. The range of values that each type of integers and floating-point numbers can take will be discussed later.

dtype Type code Description
int8 i1 8-bit signed integer
int16 i2 16-bit signed integer
int32 i4 32-bit signed integer
int64 i8 64-bit signed integer
uint8 u1 8-bit unsigned integer
uint16 u2 16-bit unsigned integer
uint32 u4 32-bit unsigned integer
uint64 u8 64-bit unsigned integer
float16 f2 Half precision floating-point (1 bit for sign, 5 bits for exponent, 10 bits for mantissa)
float32 f4 Single precision floating-point (1 bit for sign, 8 bits for exponent, 23 bits for mantissa)
float64 f8 Double precision floating-point (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)
float128 f16 Quadruple precision floating-point type (1 bit for sign, 15 bits for exponent, 112 bits for mantissa)
complex64 c8 Complex number (real and imaginary parts are float32)
complex128 c16 Complex number (real and imaginary parts are float64)
complex256 c32 Complex number (real and imaginary parts are float128)
bool ? Boolean (True or False)
unicode U Unicode string
object O Python object

The numeric suffix in a dtype name indicates the number of bits, whereas the numeric suffix in a type code indicates the number of bytes. Note that the numbers are different even for the same type.

The type code ? for bool does not mean unknown but is literally assigned ?.

When specifying dtype in functions or methods, for example, int64 can be specified in the following three ways:

  • Type object: np.int64
  • Type name as a string: 'int64'
  • Type code as a string: 'i8'
a = np.array([1, 2, 3], dtype=np.int64)
print(a.dtype)
# int64

a = np.array([1, 2, 3], dtype='int64')
print(a.dtype)
# int64

a = np.array([1, 2, 3], dtype='i8')
print(a.dtype)
# int64

Note that when bool, unicode, or object is specified as a type object, it must be suffixed with an underscore _, as in np.bool_, np.unicode_, or np.object_.

Python types such as int, float, and str can also be specified. In this case, they are treated as the equivalent dtype. Examples in Python 3, 64-bit environment are as follows. uint, which is not a standard Python type, is included for convenience.

Python type Equivalent dtype example
int int64
float float64
str unicode
(uint) uint64

When specifying as an argument, strings 'int' or 'float' can be used for int or float. The non-Python type uint must be specified using the string 'uint'.

a = np.array([1, 2, 3], dtype=int)
print(a.dtype)
# int64

a = np.array([1, 2, 3], dtype='int')
print(a.dtype)
# int64

Range of numeric types (minimum and maximum values)

The range of values for integer (int, uint) and floating-point number (float) can be checked with np.iinfo() and np.finfo().

np.iinfo()

Use np.iinfo() for integers (int, uint).

Specifying a data type as an argument returns a numpy.iinfo object, which can be inspected using print() to see an overview, or its max and min attributes can be accessed to obtain the maximum and minimum values as numbers.

The following example uses np.int64, but strings such as 'int64' or 'i8' can also be used.

ii = np.iinfo(np.int64)
print(type(ii))
# <class 'numpy.iinfo'>

print(ii)
# Machine parameters for int64
# ---------------------------------------------------------------
# min = -9223372036854775808
# max = 9223372036854775807
# ---------------------------------------------------------------
# 

print(ii.max)
# 9223372036854775807

print(type(ii.max))
# <class 'int'>

print(ii.min)
# -9223372036854775808

print(ii.bits)
# 64

You can also specify the value directly as an argument.

i = 100
print(type(i))
# <class 'int'>

print(np.iinfo(i))
# Machine parameters for int64
# ---------------------------------------------------------------
# min = -9223372036854775808
# max = 9223372036854775807
# ---------------------------------------------------------------
# 

ui = np.uint8(100)
print(type(ui))
# <class 'numpy.uint8'>

print(np.iinfo(ui))
# Machine parameters for uint8
# ---------------------------------------------------------------
# min = 0
# max = 255
# ---------------------------------------------------------------
# 

NumPy arrays (ndarray) cannot be specified. You need to specify either the data type of the array or provide a specific value instead.

a = np.array([1, 2, 3], dtype=np.int8)
print(type(a))
# <class 'numpy.ndarray'>

# print(np.iinfo(a))
# ValueError: Invalid integer data type 'O'.

print(np.iinfo(a.dtype))
# Machine parameters for int8
# ---------------------------------------------------------------
# min = -128
# max = 127
# ---------------------------------------------------------------
# 

print(np.iinfo(a[0]))
# Machine parameters for int8
# ---------------------------------------------------------------
# min = -128
# max = 127
# ---------------------------------------------------------------
# 

np.finfo()

Use np.finfo() for floating-point numbers (float).

The usage is the same as np.iinfo(). The arguments can be a type object (e.g., np.float64), a string ('float64', 'f8'), or a value (0.1).

You can output an overview with print() or obtain the values of various attributes as numbers.

fi = np.finfo(np.float64)
print(type(fi))
# <class 'numpy.finfo'>

print(fi)
# Machine parameters for float64
# ---------------------------------------------------------------
# precision =  15   resolution = 1.0000000000000001e-15
# machep =    -52   eps =        2.2204460492503131e-16
# negep =     -53   epsneg =     1.1102230246251565e-16
# minexp =  -1022   tiny =       2.2250738585072014e-308
# maxexp =   1024   max =        1.7976931348623157e+308
# nexp =       11   min =        -max
# smallest_normal = 2.2250738585072014e-308   smallest_subnormal = 4.9406564584124654e-324
# ---------------------------------------------------------------
# 

print(fi.max)
# 1.7976931348623157e+308

print(type(fi.max))
# <class 'numpy.float64'>

print(fi.min)
# -1.7976931348623157e+308

print(fi.eps)
# 2.220446049250313e-16

print(fi.bits)
# 64

print(fi.iexp)
# 11

print(fi.nmant)
# 52

np.finfo() provides more information than np.iinfo(), such as eps for epsilon, iexp and nmant for the number of bits in the exponent and mantissa. For details, refer to the official documentation above.

The number of characters in strings

When holding elements as strings, the dtype will be formatted as <U3, for example.

a_str = np.array([1, 22, 333], dtype=str)
print(a_str)
# ['1' '22' '333']

print(a_str.dtype)
# <U3

The leading < and > represent little-endian and big-endian, respectively.

The trailing number signifies the maximum number of characters that can be stored. When dtype is specified as str in np.array(), as shown in this example, this number would be set based on the longest element.

Only enough memory for this number of characters is allocated for each element, meaning that strings longer than this number of characters cannot be accommodated and will be truncated. Therefore, it is necessary to specify a data type with a sufficient character length in advance.

a_str[0] = 'abcde'
print(a_str)
# ['abc' '22' '333']

a_str10 = np.array([1, 22, 333], dtype='U10')
print(a_str10.dtype)
# <U10

a_str10[0] = 'abcde'
print(a_str10)
# ['abcde' '22' '333']

object stores pointers to Python objects

The object type is a special data type that stores pointers to Python objects.

This means that the actual data for each element is stored in a separate memory space, enabling an ndarray to hold pointers to data of different types.

a_object = np.array([1, 0.1, 'abc'], dtype=object)
print(a_object)
# [1 0.1 'abc']

print(a_object.dtype)
# object

print(type(a_object[0]))
print(type(a_object[1]))
print(type(a_object[2]))
# <class 'int'>
# <class 'float'>
# <class 'str'>

When the dtype is set to object, you can freely increase the number of characters in a string.

a_object[2] = 'abcXYZ'
print(a_object)
# [1 0.1 'abcXYZ']

Arrays containing elements of different types can also be represented using Python's built-in list type.

list and ndarray behave differently with operators. While ndarray supports element-wise operations, creating and processing such data in NumPy might be less common, considering the versatility of list for handling mixed types.

l = [1, 0.1, 'abcXYZ']

print(type(l))
# <class 'list'>

print(type(l[0]))
print(type(l[1]))
print(type(l[2]))
# <class 'int'>
# <class 'float'>
# <class 'str'>

print(a_object * 2)
# [2 0.2 'abcXYZabcXYZ']

print(l * 2)
# [1, 0.1, 'abcXYZ', 1, 0.1, 'abcXYZ']

Change dtype with astype()

Basic usage of astype()

The astype() method of ndarray allows for changing (casting) dtype.

A new ndarray with a changed dtype is generated, and the original ndarray remains unchanged.

a = np.array([1, 2, 3])
print(a)
print(a.dtype)
# [1 2 3]
# int64

a_float = a.astype(np.float32)
print(a_float)
print(a_float.dtype)
# [1. 2. 3.]
# float32

print(a)
print(a.dtype)
# [1 2 3]
# int64

As mentioned above, dtype can also be specified as a type name string, type code string, or Python type.

a_int = a.astype('int32')
print(a_int)
print(a_int.dtype)
# [1 2 3]
# int32

a_uint = a.astype('u8')
print(a_uint)
print(a_uint.dtype)
# [1 2 3]
# uint64

a_float = a.astype(float)
print(a_float)
print(a_float.dtype)
# [1. 2. 3.]
# float64

Conversion from float to int truncate the decimal part

Converting from floating-point numbers (float) to integers (int) truncates the decimal part (rounding towards 0).

a = np.array([-2, -1.5, -1, -0.5, 0.5, 1, 1.5, 2])

print(a.astype(int))
# [-2 -1 -1  0  0  1  1  2]

Refer to the following articles for rounding, truncating, and ceiling in NumPy.

Implicit type conversions

In addition to explicit type conversion with astype(), implicit type conversion can occur during operations.

For example, division with the / operator returns float even between integers.

a_int = np.array([1, 2, 3])
a_float = np.array([1.0, 2.0, 3.0])

print((a_int / a_int).dtype)
# float64

print((a_int / a_float).dtype)
# float64

For +, -, *, //, and ** operations, if both operands are integers, the result is int; if at least one operand is a floating-point number, the result is float.

print((a_int + a_int).dtype)
# int64

print((a_int + a_float).dtype)
# float64

print((a_int - a_int).dtype)
# int64

print((a_int - a_float).dtype)
# float64

print((a_int * a_int).dtype)
# int64

print((a_int * a_float).dtype)
# float64

print((a_int // a_int).dtype)
# int64

print((a_int // a_float).dtype)
# float64

print((a_int**a_int).dtype)
# int64

print((a_int**a_float).dtype)
# float64

Even in operations between integers or floating-point numbers, if their bit sizes differ, the result is converted to the type with the larger bit size.

a_int16 = np.array([1, 2, 3], np.int16)
a_int32 = np.array([1, 2, 3], np.int32)

print((a_int16 + a_int32).dtype)
# int32

a_float16 = np.array([1, 2, 3], np.float16)
a_float32 = np.array([1, 2, 3], np.float32)

print((a_float16 + a_float32).dtype)
# float32

However, in some cases, the result might differ in type from the original arrays. For processes where bit size is crucial, it is safer to explicitly convert to the desired type with astype() beforehand.

print((a_int16 + a_float16).dtype)
# float32

print((a_int32 + a_float32).dtype)
# float64

Note that when assigning values to elements, the dtype does not change.

For example, if a floating-point number is assigned to an array of int, the ndarray data type remains int. The assigned value is truncated, removing the decimal part, effectively rounding it towards 0.

a_int[0] = 10.9
a_int[1] = -20.9
print(a_int)
# [ 10 -20   3]

print(a_int.dtype)
# int64

Related Categories

Related Articles