NumPy: astype() to change dtype of an array
NumPy arrays (ndarray
) hold a data type (dtype
). You can set this through various operations, such as when creating an ndarray
with np.array()
, or change it later with astype()
.
Essentially, each ndarray
is assigned a single dtype
, ensuring all elements share the same data type.
While NumPy provides a mechanism for handling multiple data types within a single ndarray
, known as "Structured Arrays", this article does not cover this topic. For processing datasets that contain multiple types (for example, columns of both numbers and strings), using pandas is often more convenient.
Refer to the following article for dtype
and astype()
in pandas.
The NumPy version used in this article is as follows. Note that functionality may vary between versions.
import numpy as np
print(np.__version__)
# 1.26.1
Main data types (dtype
) in NumPy
The main data types (dtype
) in NumPy are as follows. The range of values that each type of integers and floating-point numbers can take will be discussed later.
dtype |
Type code | Description |
---|---|---|
int8 |
i1 |
8-bit signed integer |
int16 |
i2 |
16-bit signed integer |
int32 |
i4 |
32-bit signed integer |
int64 |
i8 |
64-bit signed integer |
uint8 |
u1 |
8-bit unsigned integer |
uint16 |
u2 |
16-bit unsigned integer |
uint32 |
u4 |
32-bit unsigned integer |
uint64 |
u8 |
64-bit unsigned integer |
float16 |
f2 |
Half precision floating-point (1 bit for sign, 5 bits for exponent, 10 bits for mantissa) |
float32 |
f4 |
Single precision floating-point (1 bit for sign, 8 bits for exponent, 23 bits for mantissa) |
float64 |
f8 |
Double precision floating-point (1 bit for sign, 11 bits for exponent, 52 bits for mantissa) |
float128 |
f16 |
Quadruple precision floating-point type (1 bit for sign, 15 bits for exponent, 112 bits for mantissa) |
complex64 |
c8 |
Complex number (real and imaginary parts are float32 ) |
complex128 |
c16 |
Complex number (real and imaginary parts are float64 ) |
complex256 |
c32 |
Complex number (real and imaginary parts are float128 ) |
bool |
? |
Boolean (True or False ) |
unicode |
U |
Unicode string |
object |
O |
Python object |
The numeric suffix in a dtype
name indicates the number of bits, whereas the numeric suffix in a type code indicates the number of bytes. Note that the numbers are different even for the same type.
The type code ?
for bool
does not mean unknown but is literally assigned ?
.
When specifying dtype
in functions or methods, for example, int64
can be specified in the following three ways:
- Type object:
np.int64
- Type name as a string:
'int64'
- Type code as a string:
'i8'
a = np.array([1, 2, 3], dtype=np.int64)
print(a.dtype)
# int64
a = np.array([1, 2, 3], dtype='int64')
print(a.dtype)
# int64
a = np.array([1, 2, 3], dtype='i8')
print(a.dtype)
# int64
Note that when bool
, unicode
, or object
is specified as a type object, it must be suffixed with an underscore _
, as in np.bool_
, np.unicode_
, or np.object_
.
Python types such as int
, float
, and str
can also be specified. In this case, they are treated as the equivalent dtype
. Examples in Python 3, 64-bit environment are as follows. uint
, which is not a standard Python type, is included for convenience.
Python type | Equivalent dtype example |
---|---|
int |
int64 |
float |
float64 |
str |
unicode |
(uint ) |
uint64 |
When specifying as an argument, strings 'int'
or 'float'
can be used for int
or float
. The non-Python type uint
must be specified using the string 'uint'
.
a = np.array([1, 2, 3], dtype=int)
print(a.dtype)
# int64
a = np.array([1, 2, 3], dtype='int')
print(a.dtype)
# int64
Range of numeric types (minimum and maximum values)
The range of values for integer (int
, uint
) and floating-point number (float
) can be checked with np.iinfo()
and np.finfo()
.
np.iinfo()
Use np.iinfo()
for integers (int
, uint
).
Specifying a data type as an argument returns a numpy.iinfo
object, which can be inspected using print()
to see an overview, or its max
and min
attributes can be accessed to obtain the maximum and minimum values as numbers.
The following example uses np.int64
, but strings such as 'int64'
or 'i8'
can also be used.
ii = np.iinfo(np.int64)
print(type(ii))
# <class 'numpy.iinfo'>
print(ii)
# Machine parameters for int64
# ---------------------------------------------------------------
# min = -9223372036854775808
# max = 9223372036854775807
# ---------------------------------------------------------------
#
print(ii.max)
# 9223372036854775807
print(type(ii.max))
# <class 'int'>
print(ii.min)
# -9223372036854775808
print(ii.bits)
# 64
You can also specify the value directly as an argument.
i = 100
print(type(i))
# <class 'int'>
print(np.iinfo(i))
# Machine parameters for int64
# ---------------------------------------------------------------
# min = -9223372036854775808
# max = 9223372036854775807
# ---------------------------------------------------------------
#
ui = np.uint8(100)
print(type(ui))
# <class 'numpy.uint8'>
print(np.iinfo(ui))
# Machine parameters for uint8
# ---------------------------------------------------------------
# min = 0
# max = 255
# ---------------------------------------------------------------
#
NumPy arrays (ndarray
) cannot be specified. You need to specify either the data type of the array or provide a specific value instead.
a = np.array([1, 2, 3], dtype=np.int8)
print(type(a))
# <class 'numpy.ndarray'>
# print(np.iinfo(a))
# ValueError: Invalid integer data type 'O'.
print(np.iinfo(a.dtype))
# Machine parameters for int8
# ---------------------------------------------------------------
# min = -128
# max = 127
# ---------------------------------------------------------------
#
print(np.iinfo(a[0]))
# Machine parameters for int8
# ---------------------------------------------------------------
# min = -128
# max = 127
# ---------------------------------------------------------------
#
np.finfo()
Use np.finfo()
for floating-point numbers (float
).
The usage is the same as np.iinfo()
. The arguments can be a type object (e.g., np.float64
), a string ('float64'
, 'f8'
), or a value (0.1
).
You can output an overview with print()
or obtain the values of various attributes as numbers.
fi = np.finfo(np.float64)
print(type(fi))
# <class 'numpy.finfo'>
print(fi)
# Machine parameters for float64
# ---------------------------------------------------------------
# precision = 15 resolution = 1.0000000000000001e-15
# machep = -52 eps = 2.2204460492503131e-16
# negep = -53 epsneg = 1.1102230246251565e-16
# minexp = -1022 tiny = 2.2250738585072014e-308
# maxexp = 1024 max = 1.7976931348623157e+308
# nexp = 11 min = -max
# smallest_normal = 2.2250738585072014e-308 smallest_subnormal = 4.9406564584124654e-324
# ---------------------------------------------------------------
#
print(fi.max)
# 1.7976931348623157e+308
print(type(fi.max))
# <class 'numpy.float64'>
print(fi.min)
# -1.7976931348623157e+308
print(fi.eps)
# 2.220446049250313e-16
print(fi.bits)
# 64
print(fi.iexp)
# 11
print(fi.nmant)
# 52
np.finfo()
provides more information than np.iinfo()
, such as eps
for epsilon, iexp
and nmant
for the number of bits in the exponent and mantissa. For details, refer to the official documentation above.
The number of characters in strings
When holding elements as strings, the dtype
will be formatted as <U3
, for example.
a_str = np.array([1, 22, 333], dtype=str)
print(a_str)
# ['1' '22' '333']
print(a_str.dtype)
# <U3
The leading <
and >
represent little-endian and big-endian, respectively.
The trailing number signifies the maximum number of characters that can be stored. When dtype
is specified as str
in np.array()
, as shown in this example, this number would be set based on the longest element.
Only enough memory for this number of characters is allocated for each element, meaning that strings longer than this number of characters cannot be accommodated and will be truncated. Therefore, it is necessary to specify a data type with a sufficient character length in advance.
a_str[0] = 'abcde'
print(a_str)
# ['abc' '22' '333']
a_str10 = np.array([1, 22, 333], dtype='U10')
print(a_str10.dtype)
# <U10
a_str10[0] = 'abcde'
print(a_str10)
# ['abcde' '22' '333']
object
stores pointers to Python objects
The object
type is a special data type that stores pointers to Python objects.
This means that the actual data for each element is stored in a separate memory space, enabling an ndarray
to hold pointers to data of different types.
a_object = np.array([1, 0.1, 'abc'], dtype=object)
print(a_object)
# [1 0.1 'abc']
print(a_object.dtype)
# object
print(type(a_object[0]))
print(type(a_object[1]))
print(type(a_object[2]))
# <class 'int'>
# <class 'float'>
# <class 'str'>
When the dtype
is set to object
, you can freely increase the number of characters in a string.
a_object[2] = 'abcXYZ'
print(a_object)
# [1 0.1 'abcXYZ']
Arrays containing elements of different types can also be represented using Python's built-in list
type.
list
and ndarray
behave differently with operators. While ndarray
supports element-wise operations, creating and processing such data in NumPy might be less common, considering the versatility of list
for handling mixed types.
l = [1, 0.1, 'abcXYZ']
print(type(l))
# <class 'list'>
print(type(l[0]))
print(type(l[1]))
print(type(l[2]))
# <class 'int'>
# <class 'float'>
# <class 'str'>
print(a_object * 2)
# [2 0.2 'abcXYZabcXYZ']
print(l * 2)
# [1, 0.1, 'abcXYZ', 1, 0.1, 'abcXYZ']
Change dtype
with astype()
Basic usage of astype()
The astype()
method of ndarray
allows for changing (casting) dtype
.
A new ndarray
with a changed dtype
is generated, and the original ndarray
remains unchanged.
a = np.array([1, 2, 3])
print(a)
print(a.dtype)
# [1 2 3]
# int64
a_float = a.astype(np.float32)
print(a_float)
print(a_float.dtype)
# [1. 2. 3.]
# float32
print(a)
print(a.dtype)
# [1 2 3]
# int64
As mentioned above, dtype
can also be specified as a type name string, type code string, or Python type.
a_int = a.astype('int32')
print(a_int)
print(a_int.dtype)
# [1 2 3]
# int32
a_uint = a.astype('u8')
print(a_uint)
print(a_uint.dtype)
# [1 2 3]
# uint64
a_float = a.astype(float)
print(a_float)
print(a_float.dtype)
# [1. 2. 3.]
# float64
Conversion from float
to int
truncate the decimal part
Converting from floating-point numbers (float
) to integers (int
) truncates the decimal part (rounding towards 0
).
a = np.array([-2, -1.5, -1, -0.5, 0.5, 1, 1.5, 2])
print(a.astype(int))
# [-2 -1 -1 0 0 1 1 2]
Refer to the following articles for rounding, truncating, and ceiling in NumPy.
- NumPy: Round array elements (np.round, np.around, np.rint)
- NumPy: Round up/down array elements (np.floor, np.trunc, np.ceil)
Implicit type conversions
In addition to explicit type conversion with astype()
, implicit type conversion can occur during operations.
For example, division with the /
operator returns float
even between integers.
a_int = np.array([1, 2, 3])
a_float = np.array([1.0, 2.0, 3.0])
print((a_int / a_int).dtype)
# float64
print((a_int / a_float).dtype)
# float64
For +
, -
, *
, //
, and **
operations, if both operands are integers, the result is int
; if at least one operand is a floating-point number, the result is float
.
print((a_int + a_int).dtype)
# int64
print((a_int + a_float).dtype)
# float64
print((a_int - a_int).dtype)
# int64
print((a_int - a_float).dtype)
# float64
print((a_int * a_int).dtype)
# int64
print((a_int * a_float).dtype)
# float64
print((a_int // a_int).dtype)
# int64
print((a_int // a_float).dtype)
# float64
print((a_int**a_int).dtype)
# int64
print((a_int**a_float).dtype)
# float64
Even in operations between integers or floating-point numbers, if their bit sizes differ, the result is converted to the type with the larger bit size.
a_int16 = np.array([1, 2, 3], np.int16)
a_int32 = np.array([1, 2, 3], np.int32)
print((a_int16 + a_int32).dtype)
# int32
a_float16 = np.array([1, 2, 3], np.float16)
a_float32 = np.array([1, 2, 3], np.float32)
print((a_float16 + a_float32).dtype)
# float32
However, in some cases, the result might differ in type from the original arrays. For processes where bit size is crucial, it is safer to explicitly convert to the desired type with astype()
beforehand.
print((a_int16 + a_float16).dtype)
# float32
print((a_int32 + a_float32).dtype)
# float64
Note that when assigning values to elements, the dtype
does not change.
For example, if a floating-point number is assigned to an array of int
, the ndarray
data type remains int
. The assigned value is truncated, removing the decimal part, effectively rounding it towards 0
.
a_int[0] = 10.9
a_int[1] = -20.9
print(a_int)
# [ 10 -20 3]
print(a_int.dtype)
# int64