Convert between pandas DataFrame/Series and NumPy array
This article explains how to convert between pandas DataFrame/Series and NumPy arrays (ndarray).
To convert a DataFrame or Series to an ndarray, use the to_numpy() method or the values attribute. To convert an ndarray to a DataFrame or Series, use their constructors.
For conversions between DataFrame/Series and Python built-in lists, as well as between DataFrame and Series, refer to the following articles.
- Convert between pandas DataFrame/Series and Python list
- pandas: Convert between DataFrame and Series
The pandas and NumPy versions used in this article are as follows. Note that functionality may vary between versions.
import pandas as pd
import numpy as np
print(pd.__version__)
# 2.1.4
print(np.__version__)
# 1.26.2
Convert DataFrame and Series to NumPy arrays
To convert a DataFrame or Series to a NumPy array (ndarray), use the to_numpy() method or the values attribute.
to_numpy()
To convert a DataFrame or Series to an ndarray, use the to_numpy() method.
- pandas.DataFrame.to_numpy — pandas 2.1.4 documentation
- pandas.Series.to_numpy — pandas 2.1.4 documentation
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['X', 'Y'])
print(df)
# A B
# X 1 3
# Y 2 4
print(df.to_numpy())
# [[1 3]
# [2 4]]
print(type(df.to_numpy()))
# <class 'numpy.ndarray'>
s = df['A']
print(s)
# X 1
# Y 2
# Name: A, dtype: int64
print(s.to_numpy())
# [1 2]
print(type(s.to_numpy()))
# <class 'numpy.ndarray'>
Row names (index) and column names (columns) are ignored, and only the data columns are converted to an ndarray. If you want to treat index as data, use reset_index().
The following examples focus on DataFrame, but note that the basic usage is similar for Series.
Specify data type: dtype
If the data type (dtype) of each column in the DataFrame is the same, an ndarray of that data type is created.
df_int = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['X', 'Y'])
print(df_int)
# A B
# X 1 3
# Y 2 4
print(df_int.dtypes)
# A int64
# B int64
# dtype: object
print(df_int.to_numpy())
# [[1 3]
# [2 4]]
print(df_int.to_numpy().dtype)
# int64
If the dtype of each column in the DataFrame differs, an ndarray of a common convertible type is created. For example, a DataFrame with mixed integer (int) and floating-point number (float) columns is converted into a float type ndarray.
df_int_float = pd.DataFrame({'A': [1, 2], 'B': [0.1, 0.2]}, index=['X', 'Y'])
print(df_int_float)
# A B
# X 1 0.1
# Y 2 0.2
print(df_int_float.dtypes)
# A int64
# B float64
# dtype: object
print(df_int_float.to_numpy())
# [[1. 0.1]
# [2. 0.2]]
print(df_int_float.to_numpy().dtype)
# float64
If there is no common convertible type, for example, a DataFrame with mixed numeric and string columns, the ndarray will be of object type. In this object type, each element is treated as a Python object.
df_int_str = pd.DataFrame({'A': [1, 2], 'B': ['abc', 'xyz']}, index=['X', 'Y'])
print(df_int_str)
# A B
# X 1 abc
# Y 2 xyz
print(df_int_str.dtypes)
# A int64
# B object
# dtype: object
print(df_int_str.to_numpy())
# [[1 'abc']
# [2 'xyz']]
print(df_int_str.to_numpy().dtype)
# object
print(df_int_str.to_numpy()[0, 0])
# 1
print(type(df_int_str.to_numpy()[0, 0]))
# <class 'int'>
print(df_int_str.to_numpy()[0, 1])
# abc
print(type(df_int_str.to_numpy()[0, 1]))
# <class 'str'>
Use the dtype argument in to_numpy() to specify the data type. Specifying an incompatible type will result in an error.
print(df_int_float.to_numpy(dtype='float32'))
# [[1. 0.1]
# [2. 0.2]]
print(df_int_float.to_numpy(dtype=int))
# [[1 0]
# [2 0]]
# print(df_int_str.to_numpy(dtype=int))
# ValueError: invalid literal for int() with base 10: 'abc'
For more details on data types (dtype) in pandas and NumPy, refer to the following articles.
- pandas: How to use astype() to cast dtype of DataFrame
- NumPy: Cast ndarray to a specific dtype with astype()
To convert only certain data types of a DataFrame to an ndarray, use select_dtypes(). It is also possible to extract only numeric columns.
df_int_float_str = pd.DataFrame({'A': [1, 2], 'B': [0.1, 0.2], 'C': ['abc', 'xyz']}, index=['X', 'Y'])
print(df_int_float_str)
# A B C
# X 1 0.1 abc
# Y 2 0.2 xyz
print(df_int_float_str.select_dtypes('number').to_numpy())
# [[1. 0.1]
# [2. 0.2]]
Specify whether to create a copy: copy
By default, to_numpy() creates a view if possible. If the created ndarray is a view of the original DataFrame, the two objects share memory, and changing one will change the other.
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['X', 'Y'])
a = df.to_numpy()
print(np.shares_memory(df, a))
# True
a[0, 0] = 100
print(a)
# [[100 3]
# [ 2 4]]
print(df)
# A B
# X 100 3
# Y 2 4
Setting the copy argument of to_numpy() to True creates a copy that does not share memory.
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['X', 'Y'])
a_copy = df.to_numpy(copy=True)
print(np.shares_memory(df, a_copy))
# False
a_copy[0, 0] = 100
print(a_copy)
# [[100 3]
# [ 2 4]]
print(df)
# A B
# X 1 3
# Y 2 4
Note that copy=True always results in a copy. In contrast, copy=False (default) does not always create a view. For example, when the dtype argument leads to a type conversion, a copy is created since a view is not possible.
df_int_float = pd.DataFrame({'A': [1, 2], 'B': [0.1, 0.2]}, index=['X', 'Y'])
a_float = df_int_float.to_numpy()
print(np.shares_memory(df_int_float, a_float))
# False
For more details on views and copies in pandas and NumPy, refer to the following articles.
- pandas: Views and copies in DataFrame
- NumPy: Determine if ndarray is view or copy and if it shares memory
values
The values attribute of DataFrame and Series can also be used to convert to an ndarray.
- pandas.DataFrame.values — pandas 2.1.4 documentation
- pandas.Series.values — pandas 2.1.4 documentation
Although the official documentation recommends using the to_numpy() method, no warning is issued when using the values attribute as of pandas version 2.1.4.
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['X', 'Y'])
print(df)
# A B
# X 1 3
# Y 2 4
print(df.values)
# [[1 3]
# [2 4]]
print(type(df.values))
# <class 'numpy.ndarray'>
s = df['A']
print(s)
# X 1
# Y 2
# Name: A, dtype: int64
print(s.values)
# [1 2]
print(type(s.values))
# <class 'numpy.ndarray'>
The behavior of the values attribute is the same as the default behavior of to_numpy().
If the dtype of each column in the DataFrame differs, an ndarray of a common convertible type is created. If no such common type exists, an object type ndarray is created.
df_int_float = pd.DataFrame({'A': [1, 2], 'B': [0.1, 0.2]}, index=['X', 'Y'])
print(df_int_float.values)
# [[1. 0.1]
# [2. 0.2]]
print(df_int_float.values.dtype)
# float64
df_int_str = pd.DataFrame({'A': [1, 2], 'B': ['abc', 'xyz']}, index=['X', 'Y'])
print(df_int_str.values)
# [[1 'abc']
# [2 'xyz']]
print(df_int_str.values.dtype)
# object
A view is created if possible, otherwise, a copy is created.
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['X', 'Y'])
print(np.shares_memory(df, df.values))
# True
df_int_float = pd.DataFrame({'A': [1, 2], 'B': [0.1, 0.2]}, index=['X', 'Y'])
print(np.shares_memory(df_int_float, df_int_float.values))
# False
Use copy() to explicitly create a copy.
print(np.shares_memory(df, df.values.copy()))
# False
Convert NumPy arrays to DataFrame and Series
To convert a NumPy array (ndarray) to a DataFrame or Series, use their constructors. The ndarray can be specified in the first argument of the constructor.
pd.DataFrame()
The pd.DataFrame() constructor can create a DataFrame by specifying an ndarray in the first argument.
a_2d = np.array([[1, 2], [3, 4]])
print(a_2d)
# [[1 2]
# [3 4]]
print(pd.DataFrame(a_2d))
# 0 1
# 0 1 2
# 1 3 4
A one-dimensional array creates a single-column DataFrame, but a three-dimensional or higher array results in an error.
a_1d = np.array([1, 2])
print(a_1d)
# [1 2]
print(pd.DataFrame(a_1d))
# 0
# 0 1
# 1 2
a_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print(a_3d)
# [[[1 2]
# [3 4]]
#
# [[5 6]
# [7 8]]]
# print(pd.DataFrame(a_3d))
# ValueError: Must pass 2-d input. shape=(2, 2, 2)
Row names (index), column names (columns), and data type (dtype) can be specified in the arguments.
print(pd.DataFrame(data=a_2d, index=['X', 'Y'], columns=['A', 'B'], dtype=float))
# A B
# X 1.0 2.0
# Y 3.0 4.0
Views and copies
When specifying an ndarray as the first argument, by default, pd.DataFrame() creates a view if possible. If the created DataFrame is a view of the original ndarray, the two objects share memory, and changing one will change the other.
a = np.array([[1, 2], [3, 4]])
df = pd.DataFrame(a)
print(np.shares_memory(a, df))
# True
a[0, 0] = 100
print(a)
# [[100 2]
# [ 3 4]]
print(df)
# 0 1
# 0 100 2
# 1 3 4
Setting the copy argument of pd.DataFrame() to True creates a copy.
a = np.array([[1, 2], [3, 4]])
df_copy = pd.DataFrame(a, copy=True)
print(np.shares_memory(a, df_copy))
# False
a[0, 0] = 100
print(a)
# [[100 2]
# [ 3 4]]
print(df_copy)
# 0 1
# 0 1 2
# 1 3 4
Note that setting copy=True guarantees a copy, whereas copy=False (default for ndarray) may not always result in a view. For example, if a type conversion occurs due to the dtype argument, a copy will be created since a view cannot be.
a = np.array([[1, 2], [3, 4]])
df_float = pd.DataFrame(a, dtype=float)
print(np.shares_memory(a, df_float))
# False
pd.Series()
The pd.Series() constructor has a basic usage similar to pd.DataFrame().
A Series can be created by specifying a one-dimensional ndarray in the first argument.
a_1d = np.array([1, 2])
print(a_1d)
# [1 2]
print(pd.Series(a_1d))
# 0 1
# 1 2
# dtype: int64
An error occurs for two-dimensional or higher arrays.
a_2d = np.array([[1, 2], [3, 4]])
print(a_2d)
# [[1 2]
# [3 4]]
# print(pd.Series(a_2d))
# ValueError: Data must be 1-dimensional, got ndarray of shape (2, 2) instead
Labels (index), name (name), and data type (dtype) can also be specified in the arguments.
print(pd.Series(a_1d, index=['A', 'B'], name='my_series', dtype=float))
# A 1.0
# B 2.0
# Name: my_series, dtype: float64
In terms of views and copies, pd.Series() also behaves similarly to pd.DataFrame().
By default, pd.Series() attempts to create a view. Setting the copy argument to True forces it to create a copy instead. However, copy=False (default) does not guarantee a view. For example, if you specify a dtype that requires a type conversion, pd.Series() will create a copy, since a view is not feasible in this case.
a = np.array([1, 2])
print(np.shares_memory(a, pd.Series(a)))
# True
print(np.shares_memory(a, pd.Series(a, copy=True)))
# False
print(np.shares_memory(a, pd.Series(a, dtype=float)))
# False