Read and Write CSV Files in Python

Posted: 2023-08-06 | Tags: Python, CSV

In Python, the csv module allows you to read and write CSV files.

csv — CSV File Reading and Writing — Python 3.11.4 documentation

Despite its name "csv", the module can handle any text file separated by various character strings, not just CSV (Comma-Separated Value) files.

Contents

Read CSV files: csv.reader()
Write CSV files: csv.writer()
Read and write CSV files with NumPy
Read and write CSV files with pandas

This article doesn't cover all possible arguments for the functions in the csv module. For a comprehensive understanding of these, refer to the official documentation above.

As will be discussed later in this article, it is recommended to use libraries like NumPy or pandas for operations such as calculating averages or totals from data read from CSV files.

The sample code in this article uses the csv and pprint modules. Both are included in the standard library, so no additional installation is necessary.

Pretty-print with pprint in Python

import csv
import pprint

source: csv_reader.py

Read CSV files: `csv.reader()`

Use csv.reader() to read CSV files.

Basic usage

Consider the following file as an example. Here, it is simply read as a text file using the built-in open() function.

Read, write, and create files in Python (with and open())

with open('data/src/sample.csv') as f:
    print(f.read())
# 11,12,13,14
# 21,22,23,24
# 31,32,33,34

source: csv_reader.py

Specify the file object, which is opened with open(), as the first argument of csv.reader(). A csv.reader object is an iterator that processes rows. For example, you can get data for each row as a list with a for loop.

with open('data/src/sample.csv') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)
# ['11', '12', '13', '14']
# ['21', '22', '23', '24']
# ['31', '32', '33', '34']

source: csv_reader.py

If you want to get it as a two-dimensional array (list of lists), use a list comprehension.

List comprehensions in Python

with open('data/src/sample.csv') as f:
    reader = csv.reader(f)
    l = [row for row in reader]

print(l)
# [['11', '12', '13', '14'], ['21', '22', '23', '24'], ['31', '32', '33', '34']]

source: csv_reader.py

Get rows, columns, and elements

If you read a CSV file as a two-dimensional array (list of lists), you can get a row by specifying the row number starting from 0 with an index [].

print(l[1])
# ['21', '22', '23', '24']

source: csv_reader.py

You can get an element by specifying the column number starting from 0 with an index [].

print(l[1][1])
# 22

source: csv_reader.py

To extract a column, you need to transpose the array (i.e., switch rows and columns) and then use an index [].

Transpose 2D list in Python (swap rows and columns)

l_T = [list(x) for x in zip(*l)]
print(l_T)
# [['11', '21', '31'], ['12', '22', '32'], ['13', '23', '33'], ['14', '24', '34']]

print(l_T[1])
# ['12', '22', '32']

source: csv_reader.py

Note that rows and columns can be extracted more easily using libraries like NumPy or pandas, which will be discussed later.

Convert strings to numbers

By default, each element is a string (str).

print(l[0][0])
# 11

print(type(l[0][0]))
# <class 'str'>

source: csv_reader.py

To convert a string to a number, use int() or float().

Convert a string to a number (int, float) in Python

To convert a list of strings to a list of numbers, you can use list comprehension. It is also possible to convert the entire two-dimensional array (list of lists) at once.

print(l[0])
# ['11', '12', '13', '14']

print([int(v) for v in l[0]])
# [11, 12, 13, 14]

print([[int(v) for v in row] for row in l])
# [[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]]

source: csv_reader.py

By setting the quoting argument to csv.QUOTE_NONNUMERIC, elements not enclosed in quotation marks (by default, double quotes ") will be retrieved as floating point numbers (float). However, note that an error will occur if an element that cannot be converted to float is not enclosed in quotation marks.

with open('data/src/sample.csv') as f:
    reader = csv.reader(f, quoting=csv.QUOTE_NONNUMERIC)
    l_f = [row for row in reader]

print(l_f)
# [[11.0, 12.0, 13.0, 14.0], [21.0, 22.0, 23.0, 24.0], [31.0, 32.0, 33.0, 34.0]]

print(l_f[0][0])
# 11.0

print(type(l_f[0][0]))
# <class 'float'>

source: csv_reader.py

Delimiter: `delimiter`

By default, csv.reader() treats a comma , as a delimiter. You can specify any string as a delimiter with the delimiter argument.

Consider the following file separated by spaces as an example.

with open('data/src/sample.txt') as f:
    print(f.read())
# 11 12 13 14
# 21 22 23 24
# 31 32 33 34

source: csv_reader.py

By setting delimiter=' ', elements are split at each space and correctly retrieved.

with open('data/src/sample.txt') as f:
    reader = csv.reader(f, delimiter=' ')
    l = [row for row in reader]

print(l)
# [['11', '12', '13', '14'], ['21', '22', '23', '24'], ['31', '32', '33', '34']]

source: csv_reader.py

For TSV (Tab-Separated Values) files, you should set delimiter='\t'.

Quotation marks: `quoting`, `quotechar`

In some CSV files, elements are enclosed within quotation marks, such as double quotes ".

Consider the following CSV file as an example.

with open('data/src/sample_quote.csv') as f:
    print(f.read())
# 1,2,"3"
# "a,b,c",x,y

source: csv_reader.py

By default, the quotation marks themselves are not included in the value of the element, and the delimiter within the part surrounded by quotation marks is ignored. In many situations, this behavior is considered appropriate.

with open('data/src/sample_quote.csv') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)
# ['1', '2', '3']
# ['a,b,c', 'x', 'y']

source: csv_reader.py

If you set the quoting argument to csv.QUOTE_NONE, no special processing will be done for quotation marks. The delimiter within the part surrounded by quotation marks is also treated as an element separator.

with open('data/src/sample_quote.csv') as f:
    reader = csv.reader(f, quoting=csv.QUOTE_NONE)
    for row in reader:
        print(row)
# ['1', '2', '"3"']
# ['"a', 'b', 'c"', 'x', 'y']

source: csv_reader.py

By default, the double quote " is treated as a quotation mark, but you can specify any value as a quotation mark with the quotechar argument. If you want to use a single quote ' as a quotation mark, you should set quotechar="'".

Line breaks

Consider the following CSV file example that includes line breaks within parts surrounded by quotation marks.

with open('data/src/sample_linebreak.csv') as f:
    print(f.read())
# 1,2,"3"
# "a
# b",x,y

source: csv_reader.py

According to the official documentation, in environments where the newline code is \r\n (like Windows), it is safe to set the newline argument of open() to ''.

If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n line endings on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling. csv — CSV File Reading and Writing — Python 3.11.4 documentation

with open('data/src/sample_linebreak.csv', newline='') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)
# ['1', '2', '3']
# ['a\nb', 'x', 'y']

source: csv_reader.py

Headers

Consider the following CSV file that includes a header row and an index column.

with open('data/src/sample_header_index.csv') as f:
    print(f.read())
# ,a,b,c,d
# ONE,11,12,13,14
# TWO,21,22,23,24
# THREE,31,32,33,34

source: csv_reader.py

csv.reader() does not specially handle header rows or index columns. To extract only the data, you may need to use operations such as list comprehensions and slicing.

with open('data/src/sample_header_index.csv') as f:
    reader = csv.reader(f)
    l = [row for row in reader]

pprint.pprint(l)
# [['', 'a', 'b', 'c', 'd'],
#  ['ONE', '11', '12', '13', '14'],
#  ['TWO', '21', '22', '23', '24'],
#  ['THREE', '31', '32', '33', '34']]

print([row[1:] for row in l[1:]])
# [['11', '12', '13', '14'], ['21', '22', '23', '24'], ['31', '32', '33', '34']]

print([[int(v) for v in row[1:]] for row in l[1:]])
# [[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]]

source: csv_reader.py

For such files, it's easier to use pandas, which will be mentioned later.

Read CSV files as dictionaries: `csv.DictReader()`

While csv.reader() reads each line as a list, csv.DictReader() interprets each line as a dictionary.

Consider the following CSV file as an example.

with open('data/src/sample_header.csv') as f:
    print(f.read())
# a,b,c,d
# 11,12,13,14
# 21,22,23,24
# 31,32,33,34

source: csv_dictreader.py

By default, the values in the first row are used as the dictionary keys.

with open('data/src/sample_header.csv') as f:
    reader = csv.DictReader(f)
    l = [row for row in reader]

pprint.pprint(l)
# [{'a': '11', 'b': '12', 'c': '13', 'd': '14'},
#  {'a': '21', 'b': '22', 'c': '23', 'd': '24'},
#  {'a': '31', 'b': '32', 'c': '33', 'd': '34'}]

print(l[1])
# {'a': '21', 'b': '22', 'c': '23', 'd': '24'}

print(l[1]['c'])
# 23

source: csv_dictreader.py

If there is no header in the first row, you can specify the fieldnames argument.

with open('data/src/sample.csv') as f:
    print(f.read())
# 11,12,13,14
# 21,22,23,24
# 31,32,33,34

with open('data/src/sample.csv') as f:
    reader = csv.DictReader(f, fieldnames=['a', 'b', 'c', 'd'])
    for row in reader:
        print(row)
# {'a': '11', 'b': '12', 'c': '13', 'd': '14'}
# {'a': '21', 'b': '22', 'c': '23', 'd': '24'}
# {'a': '31', 'b': '32', 'c': '33', 'd': '34'}

source: csv_dictreader.py

The index column is not specially processed. If you wish to exclude it, you can remove it using the pop() method, as demonstrated below.

Remove an item from a dictionary in Python (clear, pop, popitem, del)

with open('data/src/sample_header_index.csv') as f:
    print(f.read())
# ,a,b,c,d
# ONE,11,12,13,14
# TWO,21,22,23,24
# THREE,31,32,33,34

with open('data/src/sample_header_index.csv') as f:
    reader = csv.DictReader(f)
    l = [row for row in reader]

pprint.pprint(l, width=100)
# [{'': 'ONE', 'a': '11', 'b': '12', 'c': '13', 'd': '14'},
#  {'': 'TWO', 'a': '21', 'b': '22', 'c': '23', 'd': '24'},
#  {'': 'THREE', 'a': '31', 'b': '32', 'c': '33', 'd': '34'}]

print([od.pop('') for od in l])
# ['ONE', 'TWO', 'THREE']

pprint.pprint(l)
# [{'a': '11', 'b': '12', 'c': '13', 'd': '14'},
#  {'a': '21', 'b': '22', 'c': '23', 'd': '24'},
#  {'a': '31', 'b': '32', 'c': '33', 'd': '34'}]

source: csv_dictreader.py

Write CSV files: `csv.writer()`

Use csv.writer() to write CSV files.

Basic Usage

Specify the file object, which is opened with open(), as the first argument of csv.writer().

To create a new file, pass the path to open() with the write mode 'w'. Be aware that if the write mode 'w' is specified for an existing file, it will overwrite the original content

Read, write, and create files in Python (with and open())

Use the writerow() method to write one line at a time. Specify a list as an argument.

with open('data/temp/sample_writer_row.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerow([0, 1, 2])
    writer.writerow(['a', 'b', 'c'])

with open('data/temp/sample_writer_row.csv') as f:
    print(f.read())
# 0,1,2
# a,b,c

source: csv_writer.py

The writerows() method allows you to write a two-dimensional array (list of lists) in a single operation.

l = [[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]]

with open('data/temp/sample_writer.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerows(l)

with open('data/temp/sample_writer.csv') as f:
    print(f.read())
# 11,12,13,14
# 21,22,23,24
# 31,32,33,34

source: csv_writer.py

Append to an existing CSV file

To append to an existing CSV file, use open() with the append mode 'a'. Then, just like when creating a new file, use writerow() or writerows() to write contents. The contents will be appended at the end.

with open('data/temp/sample_writer_row.csv', 'a') as f:
    writer = csv.writer(f)
    writer.writerow(['X', 'Y', 'Z'])

with open('data/temp/sample_writer_row.csv') as f:
    print(f.read())
# 0,1,2
# a,b,c
# X,Y,Z

source: csv_writer.py

Delimiter: `delimiter`

As in the previous examples, the default output is a CSV (Comma-Separated Value) file. You can specify any delimiter with the delimiter argument.

To save as a TSV (Tab-Separated Values) file, you should set delimiter='\t'.

l = [[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]]

with open('data/temp/sample_writer.tsv', 'w') as f:
    writer = csv.writer(f, delimiter='\t')
    writer.writerows(l)

with open('data/temp/sample_writer.tsv') as f:
    print(f.read())
# 11    12  13  14
# 21    22  23  24
# 31    32  33  34

source: csv_writer.py

Quotation marks: `quoting`, `quotechar`

By default, elements containing delimiters and other special characters are written enclosed in quotation marks.

l = [[0, 1, 2], ['a,b,c', 'x', 'y']]

with open('data/temp/sample_writer_quote.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerows(l)

with open('data/temp/sample_writer_quote.csv') as f:
    print(f.read())
# 0,1,2
# "a,b,c",x,y

source: csv_writer.py

You can manipulate the quotation marks using the quoting argument. The default is quoting=csv.QUOTE_MINIMAL, which encloses only elements that contain special characters like delimiters in quotation marks, as shown in the above result.

If you set quoting=csv.QUOTE_ALL, all elements will be enclosed in quotation marks.

with open('data/temp/sample_writer_quote_all.csv', 'w') as f:
    writer = csv.writer(f, quoting=csv.QUOTE_ALL)
    writer.writerows(l)

with open('data/temp/sample_writer_quote_all.csv') as f:
    print(f.read())
# "0","1","2"
# "a,b,c","x","y"

source: csv_writer.py

If you set quoting=csv.QUOTE_NONNUMERIC, elements that are not numeric will be enclosed in quotation marks.

with open('data/temp/sample_writer_quote_nonnumeric.csv', 'w') as f:
    writer = csv.writer(f, quoting=csv.QUOTE_NONNUMERIC)
    writer.writerows(l)

with open('data/temp/sample_writer_quote_nonnumeric.csv') as f:
    print(f.read())
# 0,1,2
# "a,b,c","x","y"

source: csv_writer.py

If you set quoting=csv.QUOTE_NONE, no elements will be enclosed in quotation marks. In this case, you need to specify the character to use for escaping with the escapechar argument.

with open('data/temp/sample_writer_quote_none.csv', 'w') as f:
    writer = csv.writer(f, quoting=csv.QUOTE_NONE, escapechar='\\')
    writer.writerows(l)

with open('data/temp/sample_writer_quote_none.csv') as f:
    print(f.read())
# 0,1,2
# a\,b\,c,x,y

source: csv_writer.py

The default quotation mark is the double quotation mark ". This can be specified with the quotechar argument.

with open('data/temp/sample_writer_quote_char.csv', 'w') as f:
    writer = csv.writer(f, quotechar="'")
    writer.writerows(l)

with open('data/temp/sample_writer_quote_char.csv') as f:
    print(f.read())
# 0,1,2
# 'a,b,c',x,y

source: csv_writer.py

Line breaks

If an element contains a line break, it is written enclosed in quotation marks.

l = [[0, 1, 2], ['a\nb', 'x', 'y']]

with open('data/temp/sample_writer_linebreak.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(l)

with open('data/temp/sample_writer_linebreak.csv') as f:
    print(f.read())
# 0,1,2
# "a
# b",x,y

source: csv_writer.py

According to the official documentation, in environments where the newline code is \r\n (like Windows), it is safe to set the newline argument of open() to ''.

If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n line endings on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling. csv — CSV File Reading and Writing — Python 3.11.4 documentation

with open('data/src/sample_linebreak.csv', newline='') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)
# ['1', '2', '3']
# ['a\nb', 'x', 'y']

source: csv_reader.py

Headers

csv.writer() has no special feature for adding a header row or an index column.

To add a header row, you simply write it first with writerow(). For an index column, add an element to the start of each list being written.

Add an item to a list in Python (append, extend, insert)

l = [[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]]

header = ['', 'a', 'b', 'c', 'd']
index = ['ONE', 'TWO', 'THREE']

with open('data/temp/sample_writer_header_index.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerow(header)
    for i, row in zip(index, l):
        writer.writerow([i] + row)

with open('data/temp/sample_writer_header_index.csv') as f:
    print(f.read())
# ,a,b,c,d
# ONE,11,12,13,14
# TWO,21,22,23,24
# THREE,31,32,33,34

source: csv_writer.py

For such operations, it's easier to use pandas, which will be mentioned later.

Write dictionaries: `csv.DictWriter()`

While csv.writer() writes a list to each row, csv.DictWriter() writes a dictionary to each row.

Specify the list of dictionary keys you want to write for the second argument fieldnames of csv.DictWriter(). The writeheader() method writes the field names as a header.

Pass a dictionary to writerow() to write rows.

d1 = {'a': 1, 'b': 2, 'c': 3}
d2 = {'a': 10, 'c': 30}

with open('data/temp/sample_dictwriter.csv', 'w') as f:
    writer = csv.DictWriter(f, ['a', 'b', 'c'])
    writer.writeheader()
    writer.writerow(d1)
    writer.writerow(d2)

with open('data/temp/sample_dictwriter.csv') as f:
    print(f.read())
# a,b,c
# 1,2,3
# 10,,30

source: csv_dictwriter.py

You can also write a list of dictionaries using writerows().

with open('data/temp/sample_dictwriter_list.csv', 'w') as f:
    writer = csv.DictWriter(f, ['a', 'b', 'c'])
    writer.writeheader()
    writer.writerows([d1, d2])

with open('data/temp/sample_dictwriter_list.csv') as f:
    print(f.read())
# a,b,c
# 1,2,3
# 10,,30

source: csv_dictwriter.py

As shown in the example above, if a dictionary with no matching key is written, that element is skipped (i.e., nothing is written).

If you write a dictionary with keys not included in 'fieldnames', you can set the extrasaction argument to 'ignore' to avoid writing those key elements.

with open('data/temp/sample_dictwriter_ignore.csv', 'w') as f:
    writer = csv.DictWriter(f, ['a', 'c'], extrasaction='ignore')
    writer.writeheader()
    writer.writerows([d1, d2])

with open('data/temp/sample_dictwriter_ignore.csv') as f:
    print(f.read())
# a,c
# 1,3
# 10,30

source: csv_dictwriter.py

The default is extrasaction='raise', which throws a ValueError.

Read and write CSV files with NumPy

The third-party library NumPy offers more flexible manipulation of multidimensional arrays. It's also possible to convert between NumPy arrays (numpy.ndarray) and Python built-in lists.

Convert numpy.ndarray and list to each other

Read a CSV file:

NumPy: Read and write CSV files (np.loadtxt, np.genfromtxt, np.savetxt)

import numpy as np

with open('data/src/sample.csv') as f:
    print(f.read())
# 11,12,13,14
# 21,22,23,24
# 31,32,33,34

a = np.loadtxt('data/src/sample.csv', delimiter=',')
print(type(a))
# <class 'numpy.ndarray'>

print(a)
# [[11. 12. 13. 14.]
#  [21. 22. 23. 24.]
#  [31. 32. 33. 34.]]

source: csv_numpy.py

Extract a specific range:

NumPy: Slicing ndarray

print(a[1:, :2])
# [[21. 22.]
#  [31. 32.]]

source: csv_numpy.py

Calculate the average over all elements and column-wise sums:

NumPy: Sum, mean, max, min for entire array, column/row-wise

print(a.mean())
# 22.5

print(a.sum(axis=0))
# [63. 66. 69. 72.]

source: csv_numpy.py

Read and write CSV files with pandas

The third-party library pandas allows more flexible manipulation of mixed numeric and string data. It's also possible to convert between pandas.DataFrame and Python built-in lists.

Convert pandas.DataFrame, Series and list to each other

With pandas, reading and writing CSV files is also easy. Refer to the following articles for details.

Read a CSV file:

import pandas as pd

with open('data/src/sample_pandas_normal.csv') as f:
    print(f.read())
# name,age,state,point
# Alice,24,NY,64
# Bob,42,CA,92
# Charlie,18,CA,70
# Dave,68,TX,70
# Ellen,24,CA,88
# Frank,30,NY,57

df = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0)
print(type(df))
# <class 'pandas.core.frame.DataFrame'>

print(df)
#          age state  point
# name                     
# Alice     24    NY     64
# Bob       42    CA     92
# Charlie   18    CA     70
# Dave      68    TX     70
# Ellen     24    CA     88
# Frank     30    NY     57

source: csv_pandas.py

Select columns or elements:

print(df['age'])
# name
# Alice      24
# Bob        42
# Charlie    18
# Dave       68
# Ellen      24
# Frank      30
# Name: age, dtype: int64

print(df.at['Bob', 'age'])
# 42

source: csv_pandas.py

Extract rows by specifying conditions:

pandas: Query DataFrame and extract rows with query()

print(df.query('state == "NY"'))
#        age state  point
# name                   
# Alice   24    NY     64
# Frank   30    NY     57

print(df.query('age > 30'))
#       age state  point
# name                  
# Bob    42    CA     92
# Dave   68    TX     70

source: csv_pandas.py

Calculate summary statistics:

pandas: Get summary statistics for each column with describe()

print(df.describe())
#              age      point
# count   6.000000   6.000000
# mean   34.333333  73.500000
# std    18.392027  13.707662
# min    18.000000  57.000000
# 25%    24.000000  65.500000
# 50%    27.000000  70.000000
# 75%    39.000000  83.500000
# max    68.000000  92.000000

source: csv_pandas.py

Thus, pandas efficiently processes tabular data, especially those with mixed numerical and textual columns. For other articles related to pandas, see the link below:

List of pandas articles

Read and Write CSV Files in Python

Read CSV files: `csv.reader()`

Basic usage

Get rows, columns, and elements

Convert strings to numbers

Delimiter: `delimiter`

Quotation marks: `quoting`, `quotechar`

Line breaks

Headers

Read CSV files as dictionaries: `csv.DictReader()`

Write CSV files: `csv.writer()`

Basic Usage

Append to an existing CSV file

Delimiter: `delimiter`

Quotation marks: `quoting`, `quotechar`

Line breaks

Headers

Write dictionaries: `csv.DictWriter()`

Read and write CSV files with NumPy

Read and write CSV files with pandas

Related Categories

Related Articles

Read and Write CSV Files in Python

Read CSV files: csv.reader()

Basic usage

Get rows, columns, and elements

Convert strings to numbers

Delimiter: delimiter

Quotation marks: quoting, quotechar

Line breaks

Headers

Read CSV files as dictionaries: csv.DictReader()

Write CSV files: csv.writer()

Basic Usage

Append to an existing CSV file

Delimiter: delimiter

Quotation marks: quoting, quotechar

Line breaks

Headers

Write dictionaries: csv.DictWriter()

Read and write CSV files with NumPy

Read and write CSV files with pandas

Related Categories

Related Articles

Read CSV files: `csv.reader()`

Delimiter: `delimiter`

Quotation marks: `quoting`, `quotechar`

Read CSV files as dictionaries: `csv.DictReader()`

Write CSV files: `csv.writer()`

Delimiter: `delimiter`

Quotation marks: `quoting`, `quotechar`

Write dictionaries: `csv.DictWriter()`