pandas: Write DataFrame to CSV with to_csv()

Posted: 2023-08-04 | Tags: Python, pandas, CSV

You can write data from pandas.DataFrame and pandas.Series to CSV files using the to_csv() method. This method also allows appending to an existing CSV file. By altering the delimiter, the data can be saved as a TSV (Tab-separated values) file.

Not all arguments are covered in this article. For a comprehensive understanding of all arguments, please refer to the official documentation linked above.

Contents

Write to CSV file with to_csv()
Write only specific columns: columns
Write column/row names: header, index
Encoding: encoding
Delimiter: sep
Write mode (write/overwrite or append): mode
Handle missing values NaN: na_rep
Format for floating-point numbers (float): float_format
Write with custom formats

The pandas.read_csv() function allows you to read CSV files and load them into DataFrame objects.

pandas: Read CSV into DataFrame with read_csv()

The sample code in this article uses pandas version 2.0.3. Consider the following DataFrame as an example.

sample_pandas_normal.csv

import pandas as pd

print(pd.__version__)
# 2.0.3

df = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0).head(3)
print(df)
#          age state  point
# name                     
# Alice     24    NY     64
# Bob       42    CA     92
# Charlie   18    CA     70

source: pandas_to_csv.py

The following examples use DataFrame but are equally applicable to Series.

Write to CSV file with `to_csv()`

to_csv() is provided as a method for both pandas.DataFrame and pandas.Series.

By specifying the file path as the first argument, the data can be written to that path in CSV format.

df.to_csv('data/dst/to_csv_out.csv')

source: pandas_to_csv.py

name,age,state,point
Alice,24,NY,64
Bob,42,CA,92
Charlie,18,CA,70

source: to_csv_out.csv

The path can be either absolute or relative. For how to check and change the current directory, see the following article.

Get and change the current working directory in Python

Write only specific columns: `columns`

To write only specific columns, specify a list of column names to the columns argument.

df.to_csv('data/dst/to_csv_out_columns.csv', columns=['age', 'point'])

source: pandas_to_csv.py

name,age,point
Alice,24,64
Bob,42,92
Charlie,18,70

source: to_csv_out_columns.csv

By default, columns is set to None, and all columns are included in the output.

Write column/row names: `header`, `index`

You can control whether to write column names (columns) and row names (index) by setting the header and index arguments to True or False.

df.to_csv('data/dst/to_csv_out_header_index.csv', header=False, index=False)

source: pandas_to_csv.py

24,NY,64
42,CA,92
18,CA,70

source: to_csv_out_header_index.csv

By default, both are set to True, and both columns (header) and index are included in the output, as demonstrated in the previous examples.

Encoding: `encoding`

The encoding argument allows you to define the encoding of the output CSV file. By default, this is utf-8. You can change the encoding as needed, for example, to encoding='shift_jis' or encoding='cp932'.

Delimiter: `sep`

The sep argument allows you to set the delimiter. The default delimiter is a comma ,, which produces a CSV file.

To create a TSV file, which uses a tab character \t as the delimiter, you can simply set sep='\t'.

df.to_csv('data/dst/to_csv_out.tsv', sep='\t')

source: pandas_to_csv.py

name    age state   point
Alice   24  NY  64
Bob 42  CA  92
Charlie 18  CA  70

source: to_csv_out.tsv

Write mode (write/overwrite or append): `mode`

The mode argument allows you to specify the write mode, similar to the built-in open() function.

Read, write, and create files in Python (with and open())

By default, mode is set to 'w'. If the specified path does not exist, it creates a new file; if the path already exists, it overwrites the existing file.

To prevent overwriting an existing file, use mode='x'. If the specified path does not exist, a new file is created; if it does, an error is returned, and the file is not overwritten.

# df.to_csv('data/dst/to_csv_out.csv', mode='x')
# FileExistsError: [Errno 17] File exists: 'data/dst/to_csv_out.csv'

source: pandas_to_csv.py

For appending data, use mode='a'. This will append the DataFrame to the end of the existing file as new rows. Note that the header (columns) will also be appended unless header=False is specified.

df.to_csv('data/dst/to_csv_out_a.csv')
df.to_csv('data/dst/to_csv_out_a.csv', mode='a', header=False)

source: pandas_to_csv.py

name,age,state,point
Alice,24,NY,64
Bob,42,CA,92
Charlie,18,CA,70
Alice,24,NY,64
Bob,42,CA,92
Charlie,18,CA,70

source: to_csv_out_a.csv

mode='a' is used to append rows. If you want to add columns, you should read the target file, add the columns, and then overwrite the original file. You can use mode='w' for overwriting the file, but this argument can be omitted because 'w' is the default mode.

df.to_csv('data/dst/to_csv_out_a_new_column.csv')

df_new = pd.read_csv('data/dst/to_csv_out_a_new_column.csv', index_col=0)
print(df_new)
#          age state  point
# name                     
# Alice     24    NY     64
# Bob       42    CA     92
# Charlie   18    CA     70

df_new['new_col'] = 'new data'
print(df_new)
#          age state  point   new_col
# name                               
# Alice     24    NY     64  new data
# Bob       42    CA     92  new data
# Charlie   18    CA     70  new data

df_new.to_csv('data/dst/to_csv_out_a_new_column.csv')

source: pandas_to_csv.py

name,age,state,point,new_col
Alice,24,NY,64,new data
Bob,42,CA,92,new data
Charlie,18,CA,70,new data

source: to_csv_out_a_new_column.csv

Handle missing values `NaN`: `na_rep`

Consider the following DataFrame containing missing values NaN.

df_nan = df.copy()
df_nan.iat[0, 1] = float('nan')
df_nan.iat[1, 2] = float('nan')

print(df_nan)
#          age state  point
# name                     
# Alice     24   NaN   64.0
# Bob       42    CA    NaN
# Charlie   18    CA   70.0

source: pandas_to_csv.py

By default, missing values are represented as empty strings in the output.

df_nan.to_csv('data/dst/to_csv_out_nan.csv')

source: pandas_to_csv.py

name,age,state,point
Alice,24,,64.0
Bob,42,CA,
Charlie,18,CA,70.0

source: to_csv_out_nan.csv

You can replace missing values with a specific string using the na_rep argument.

df_nan.to_csv('data/dst/to_csv_out_nan_rep.csv', na_rep='NaN')

source: pandas_to_csv.py

name,age,state,point
Alice,24,NaN,64.0
Bob,42,CA,NaN
Charlie,18,CA,70.0

source: to_csv_out_nan_rep.csv

Refer to the following articles for information on handling missing values in DataFrame.

Format for floating-point numbers (`float`): `float_format`

Consider the following DataFrame.

df = pd.DataFrame({'col1': [0.123456789, 1000000000.0],
                   'col2': [123456789.0, 0.0],
                   'col3': [123456789, 0]})
print(df)
#            col1         col2       col3
# 0  1.234568e-01  123456789.0  123456789
# 1  1.000000e+09          0.0          0

print(df.dtypes)
# col1    float64
# col2    float64
# col3      int64
# dtype: object

source: pandas_to_csv_float.py

When you use print(), the output appears in scientific notation. However, this is due to the display settings and does not indicate that the actual value has been rounded off.

print(df.iat[0, 0])
# 0.123456789

print(df.iat[1, 0])
# 1000000000.0

source: pandas_to_csv_float.py

By default, when using to_csv(), the value is saved as is.

df.to_csv('data/dst/to_csv_out_float_default.csv')

source: pandas_to_csv_float.py

,col1,col2,col3
0,0.123456789,123456789.0,123456789
1,1000000000.0,0.0,0

source: to_csv_out_float_default.csv

With the float_format argument, you can control the formatting of floating-point numbers (float) when saving. You can either provide a format string in style used by printf (with a %), or use a callable object like format().

Here is an example using the printf-style, where the number of digits after the decimal point is set to three.

Built-in Types - printf-style String Formatting — Python 3.11.4 documentation

print('%.3f' % 0.123456789)
# 0.123

print('%.3f' % 123456789)
# 123456789.000

df.to_csv('data/dst/to_csv_out_float_format_3f.csv', float_format='%.3f')

source: pandas_to_csv_float.py

,col1,col2,col3
0,0.123,123456789.000,123456789
1,1000000000.000,0.000,0

source: to_csv_out_float_format_3f.csv

Here is an example using a callable object. This outputs in scientific notation with three digits after the decimal point.

Format strings and numbers with format() in Python

print('{:.3e}'.format(0.123456789))
# 1.235e-01

print('{:.3e}'.format(123456789))
# 1.235e+08

df.to_csv('data/dst/to_csv_out_float_format_3e.csv', float_format='{:.3e}'.format)

source: pandas_to_csv_float.py

,col1,col2,col3
0,1.235e-01,1.235e+08,123456789
1,1.000e+09,0.000e+00,0

source: to_csv_out_float_format_3e.csv

Please note that specifying the number of digits may lead to loss of information beyond the specified number of digits during saving.

Write with custom formats

The float_format argument, as the name suggests, only applies to columns with the float data type. Columns with the integer (int) type remain as they are. Moreover, you can't specify different formats for each column.

If you want to specify a format for int columns, or use a different format for each float column, you should convert the data in the original DataFrame to strings in the desired format before saving it.

df = pd.DataFrame({'col1': [0.123456789, 1000000000.0],
                   'col2': [123456789.0, 0.0],
                   'col3': [123456789, 0]})

df['col1'] = df['col1'].map('{:.3f}'.format)
df['col2'] = df['col2'].map('{:.3e}'.format)
df['col3'] = df['col3'].map('{:#010x}'.format)

print(df)
#              col1       col2        col3
# 0           0.123  1.235e+08  0x075bcd15
# 1  1000000000.000  0.000e+00  0x00000000

df.to_csv('data/dst/to_csv_out_float_format_str.csv')

source: pandas_to_csv_float.py

,col1,col2,col3
0,0.123,1.235e+08,0x075bcd15
1,1000000000.000,0.000e+00,0x00000000

source: to_csv_out_float_format_str.csv

In the example above, note that when saving integers in hexadecimal form, pandas.read_csv() will interpret them as strings. If you want them to be treated as numerical data, you'll need to convert them after loading.

df = pd.read_csv('data/dst/to_csv_out_float_format_str.csv', index_col=0)
print(df)
#            col1         col2        col3
# 0  1.230000e-01  123500000.0  0x075bcd15
# 1  1.000000e+09          0.0  0x00000000

print(df.dtypes)
# col1    float64
# col2    float64
# col3     object
# dtype: object

df['col3'] = df['col3'].map(lambda x: int(x, 16))
print(df)
#            col1         col2       col3
# 0  1.230000e-01  123500000.0  123456789
# 1  1.000000e+09          0.0          0

print(df.dtypes)
# col1    float64
# col2    float64
# col3      int64
# dtype: object

source: pandas_to_csv_float.py

pandas: Write DataFrame to CSV with to_csv()

Write to CSV file with `to_csv()`

Write only specific columns: `columns`

Write column/row names: `header`, `index`

Encoding: `encoding`

Delimiter: `sep`

Write mode (write/overwrite or append): `mode`

Handle missing values `NaN`: `na_rep`

Format for floating-point numbers (`float`): `float_format`

Write with custom formats

Related Categories

Related Articles

pandas: Write DataFrame to CSV with to_csv()

Write to CSV file with to_csv()

Write only specific columns: columns

Write column/row names: header, index

Encoding: encoding

Delimiter: sep

Write mode (write/overwrite or append): mode

Handle missing values NaN: na_rep

Format for floating-point numbers (float): float_format

Write with custom formats

Related Categories

Related Articles

Write to CSV file with `to_csv()`

Write only specific columns: `columns`

Write column/row names: `header`, `index`

Encoding: `encoding`

Delimiter: `sep`

Write mode (write/overwrite or append): `mode`

Handle missing values `NaN`: `na_rep`

Format for floating-point numbers (`float`): `float_format`