Read and Write CSV Files in Python

Posted: | Tags: Python, CSV

In Python, the csv module allows you to read and write CSV files.

Despite its name "csv", the module can handle any text file separated by various character strings, not just CSV (Comma-Separated Value) files.

This article doesn't cover all possible arguments for the functions in the csv module. For a comprehensive understanding of these, refer to the official documentation above.

As will be discussed later in this article, it is recommended to use libraries like NumPy or pandas for operations such as calculating averages or totals from data read from CSV files.

The sample code in this article uses the csv and pprint modules. Both are included in the standard library, so no additional installation is necessary.

import csv
import pprint

Read CSV files: csv.reader()

Use csv.reader() to read CSV files.

Basic usage

Consider the following file as an example. Here, it is simply read as a text file using the built-in open() function.

with open('data/src/sample.csv') as f:
    print(f.read())
# 11,12,13,14
# 21,22,23,24
# 31,32,33,34

Specify the file object, which is opened with open(), as the first argument of csv.reader(). A csv.reader object is an iterator that processes rows. For example, you can get data for each row as a list with a for loop.

with open('data/src/sample.csv') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)
# ['11', '12', '13', '14']
# ['21', '22', '23', '24']
# ['31', '32', '33', '34']

If you want to get it as a two-dimensional array (list of lists), use a list comprehension.

with open('data/src/sample.csv') as f:
    reader = csv.reader(f)
    l = [row for row in reader]

print(l)
# [['11', '12', '13', '14'], ['21', '22', '23', '24'], ['31', '32', '33', '34']]

Get rows, columns, and elements

If you read a CSV file as a two-dimensional array (list of lists), you can get a row by specifying the row number starting from 0 with an index [].

print(l[1])
# ['21', '22', '23', '24']

You can get an element by specifying the column number starting from 0 with an index [].

print(l[1][1])
# 22

To extract a column, you need to transpose the array (i.e., switch rows and columns) and then use an index [].

l_T = [list(x) for x in zip(*l)]
print(l_T)
# [['11', '21', '31'], ['12', '22', '32'], ['13', '23', '33'], ['14', '24', '34']]

print(l_T[1])
# ['12', '22', '32']

Note that rows and columns can be extracted more easily using libraries like NumPy or pandas, which will be discussed later.

Convert strings to numbers

By default, each element is a string (str).

print(l[0][0])
# 11

print(type(l[0][0]))
# <class 'str'>

To convert a string to a number, use int() or float().

To convert a list of strings to a list of numbers, you can use list comprehension. It is also possible to convert the entire two-dimensional array (list of lists) at once.

print(l[0])
# ['11', '12', '13', '14']

print([int(v) for v in l[0]])
# [11, 12, 13, 14]

print([[int(v) for v in row] for row in l])
# [[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]]

By setting the quoting argument to csv.QUOTE_NONNUMERIC, elements not enclosed in quotation marks (by default, double quotes ") will be retrieved as floating point numbers (float). However, note that an error will occur if an element that cannot be converted to float is not enclosed in quotation marks.

with open('data/src/sample.csv') as f:
    reader = csv.reader(f, quoting=csv.QUOTE_NONNUMERIC)
    l_f = [row for row in reader]

print(l_f)
# [[11.0, 12.0, 13.0, 14.0], [21.0, 22.0, 23.0, 24.0], [31.0, 32.0, 33.0, 34.0]]

print(l_f[0][0])
# 11.0

print(type(l_f[0][0]))
# <class 'float'>

Delimiter: delimiter

By default, csv.reader() treats a comma , as a delimiter. You can specify any string as a delimiter with the delimiter argument.

Consider the following file separated by spaces as an example.

with open('data/src/sample.txt') as f:
    print(f.read())
# 11 12 13 14
# 21 22 23 24
# 31 32 33 34

By setting delimiter=' ', elements are split at each space and correctly retrieved.

with open('data/src/sample.txt') as f:
    reader = csv.reader(f, delimiter=' ')
    l = [row for row in reader]

print(l)
# [['11', '12', '13', '14'], ['21', '22', '23', '24'], ['31', '32', '33', '34']]

For TSV (Tab-Separated Values) files, you should set delimiter='\t'.

Quotation marks: quoting, quotechar

In some CSV files, elements are enclosed within quotation marks, such as double quotes ".

Consider the following CSV file as an example.

with open('data/src/sample_quote.csv') as f:
    print(f.read())
# 1,2,"3"
# "a,b,c",x,y

By default, the quotation marks themselves are not included in the value of the element, and the delimiter within the part surrounded by quotation marks is ignored. In many situations, this behavior is considered appropriate.

with open('data/src/sample_quote.csv') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)
# ['1', '2', '3']
# ['a,b,c', 'x', 'y']

If you set the quoting argument to csv.QUOTE_NONE, no special processing will be done for quotation marks. The delimiter within the part surrounded by quotation marks is also treated as an element separator.

with open('data/src/sample_quote.csv') as f:
    reader = csv.reader(f, quoting=csv.QUOTE_NONE)
    for row in reader:
        print(row)
# ['1', '2', '"3"']
# ['"a', 'b', 'c"', 'x', 'y']

By default, the double quote " is treated as a quotation mark, but you can specify any value as a quotation mark with the quotechar argument. If you want to use a single quote ' as a quotation mark, you should set quotechar="'".

Line breaks

Consider the following CSV file example that includes line breaks within parts surrounded by quotation marks.

with open('data/src/sample_linebreak.csv') as f:
    print(f.read())
# 1,2,"3"
# "a
# b",x,y

According to the official documentation, in environments where the newline code is \r\n (like Windows), it is safe to set the newline argument of open() to ''.

If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n line endings on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling. csv — CSV File Reading and Writing — Python 3.11.4 documentation

with open('data/src/sample_linebreak.csv', newline='') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)
# ['1', '2', '3']
# ['a\nb', 'x', 'y']

Headers

Consider the following CSV file that includes a header row and an index column.

with open('data/src/sample_header_index.csv') as f:
    print(f.read())
# ,a,b,c,d
# ONE,11,12,13,14
# TWO,21,22,23,24
# THREE,31,32,33,34

csv.reader() does not specially handle header rows or index columns. To extract only the data, you may need to use operations such as list comprehensions and slicing.

with open('data/src/sample_header_index.csv') as f:
    reader = csv.reader(f)
    l = [row for row in reader]

pprint.pprint(l)
# [['', 'a', 'b', 'c', 'd'],
#  ['ONE', '11', '12', '13', '14'],
#  ['TWO', '21', '22', '23', '24'],
#  ['THREE', '31', '32', '33', '34']]

print([row[1:] for row in l[1:]])
# [['11', '12', '13', '14'], ['21', '22', '23', '24'], ['31', '32', '33', '34']]

print([[int(v) for v in row[1:]] for row in l[1:]])
# [[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]]

For such files, it's easier to use pandas, which will be mentioned later.

Read CSV files as dictionaries: csv.DictReader()

While csv.reader() reads each line as a list, csv.DictReader() interprets each line as a dictionary.

Consider the following CSV file as an example.

with open('data/src/sample_header.csv') as f:
    print(f.read())
# a,b,c,d
# 11,12,13,14
# 21,22,23,24
# 31,32,33,34

By default, the values in the first row are used as the dictionary keys.

with open('data/src/sample_header.csv') as f:
    reader = csv.DictReader(f)
    l = [row for row in reader]

pprint.pprint(l)
# [{'a': '11', 'b': '12', 'c': '13', 'd': '14'},
#  {'a': '21', 'b': '22', 'c': '23', 'd': '24'},
#  {'a': '31', 'b': '32', 'c': '33', 'd': '34'}]

print(l[1])
# {'a': '21', 'b': '22', 'c': '23', 'd': '24'}

print(l[1]['c'])
# 23

If there is no header in the first row, you can specify the fieldnames argument.

with open('data/src/sample.csv') as f:
    print(f.read())
# 11,12,13,14
# 21,22,23,24
# 31,32,33,34

with open('data/src/sample.csv') as f:
    reader = csv.DictReader(f, fieldnames=['a', 'b', 'c', 'd'])
    for row in reader:
        print(row)
# {'a': '11', 'b': '12', 'c': '13', 'd': '14'}
# {'a': '21', 'b': '22', 'c': '23', 'd': '24'}
# {'a': '31', 'b': '32', 'c': '33', 'd': '34'}

The index column is not specially processed. If you wish to exclude it, you can remove it using the pop() method, as demonstrated below.

with open('data/src/sample_header_index.csv') as f:
    print(f.read())
# ,a,b,c,d
# ONE,11,12,13,14
# TWO,21,22,23,24
# THREE,31,32,33,34

with open('data/src/sample_header_index.csv') as f:
    reader = csv.DictReader(f)
    l = [row for row in reader]

pprint.pprint(l, width=100)
# [{'': 'ONE', 'a': '11', 'b': '12', 'c': '13', 'd': '14'},
#  {'': 'TWO', 'a': '21', 'b': '22', 'c': '23', 'd': '24'},
#  {'': 'THREE', 'a': '31', 'b': '32', 'c': '33', 'd': '34'}]

print([od.pop('') for od in l])
# ['ONE', 'TWO', 'THREE']

pprint.pprint(l)
# [{'a': '11', 'b': '12', 'c': '13', 'd': '14'},
#  {'a': '21', 'b': '22', 'c': '23', 'd': '24'},
#  {'a': '31', 'b': '32', 'c': '33', 'd': '34'}]

Write CSV files: csv.writer()

Use csv.writer() to write CSV files.

Basic Usage

Specify the file object, which is opened with open(), as the first argument of csv.writer().

To create a new file, pass the path to open() with the write mode 'w'. Be aware that if the write mode 'w' is specified for an existing file, it will overwrite the original content

Use the writerow() method to write one line at a time. Specify a list as an argument.

with open('data/temp/sample_writer_row.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerow([0, 1, 2])
    writer.writerow(['a', 'b', 'c'])

with open('data/temp/sample_writer_row.csv') as f:
    print(f.read())
# 0,1,2
# a,b,c

The writerows() method allows you to write a two-dimensional array (list of lists) in a single operation.

l = [[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]]

with open('data/temp/sample_writer.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerows(l)

with open('data/temp/sample_writer.csv') as f:
    print(f.read())
# 11,12,13,14
# 21,22,23,24
# 31,32,33,34

Append to an existing CSV file

To append to an existing CSV file, use open() with the append mode 'a'. Then, just like when creating a new file, use writerow() or writerows() to write contents. The contents will be appended at the end.

with open('data/temp/sample_writer_row.csv', 'a') as f:
    writer = csv.writer(f)
    writer.writerow(['X', 'Y', 'Z'])

with open('data/temp/sample_writer_row.csv') as f:
    print(f.read())
# 0,1,2
# a,b,c
# X,Y,Z

Delimiter: delimiter

As in the previous examples, the default output is a CSV (Comma-Separated Value) file. You can specify any delimiter with the delimiter argument.

To save as a TSV (Tab-Separated Values) file, you should set delimiter='\t'.

l = [[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]]

with open('data/temp/sample_writer.tsv', 'w') as f:
    writer = csv.writer(f, delimiter='\t')
    writer.writerows(l)

with open('data/temp/sample_writer.tsv') as f:
    print(f.read())
# 11    12  13  14
# 21    22  23  24
# 31    32  33  34

Quotation marks: quoting, quotechar

By default, elements containing delimiters and other special characters are written enclosed in quotation marks.

l = [[0, 1, 2], ['a,b,c', 'x', 'y']]

with open('data/temp/sample_writer_quote.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerows(l)

with open('data/temp/sample_writer_quote.csv') as f:
    print(f.read())
# 0,1,2
# "a,b,c",x,y

You can manipulate the quotation marks using the quoting argument. The default is quoting=csv.QUOTE_MINIMAL, which encloses only elements that contain special characters like delimiters in quotation marks, as shown in the above result.

If you set quoting=csv.QUOTE_ALL, all elements will be enclosed in quotation marks.

with open('data/temp/sample_writer_quote_all.csv', 'w') as f:
    writer = csv.writer(f, quoting=csv.QUOTE_ALL)
    writer.writerows(l)

with open('data/temp/sample_writer_quote_all.csv') as f:
    print(f.read())
# "0","1","2"
# "a,b,c","x","y"

If you set quoting=csv.QUOTE_NONNUMERIC, elements that are not numeric will be enclosed in quotation marks.

with open('data/temp/sample_writer_quote_nonnumeric.csv', 'w') as f:
    writer = csv.writer(f, quoting=csv.QUOTE_NONNUMERIC)
    writer.writerows(l)

with open('data/temp/sample_writer_quote_nonnumeric.csv') as f:
    print(f.read())
# 0,1,2
# "a,b,c","x","y"

If you set quoting=csv.QUOTE_NONE, no elements will be enclosed in quotation marks. In this case, you need to specify the character to use for escaping with the escapechar argument.

with open('data/temp/sample_writer_quote_none.csv', 'w') as f:
    writer = csv.writer(f, quoting=csv.QUOTE_NONE, escapechar='\\')
    writer.writerows(l)

with open('data/temp/sample_writer_quote_none.csv') as f:
    print(f.read())
# 0,1,2
# a\,b\,c,x,y

The default quotation mark is the double quotation mark ". This can be specified with the quotechar argument.

with open('data/temp/sample_writer_quote_char.csv', 'w') as f:
    writer = csv.writer(f, quotechar="'")
    writer.writerows(l)

with open('data/temp/sample_writer_quote_char.csv') as f:
    print(f.read())
# 0,1,2
# 'a,b,c',x,y

Line breaks

If an element contains a line break, it is written enclosed in quotation marks.

l = [[0, 1, 2], ['a\nb', 'x', 'y']]

with open('data/temp/sample_writer_linebreak.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(l)

with open('data/temp/sample_writer_linebreak.csv') as f:
    print(f.read())
# 0,1,2
# "a
# b",x,y

According to the official documentation, in environments where the newline code is \r\n (like Windows), it is safe to set the newline argument of open() to ''.

If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n line endings on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling. csv — CSV File Reading and Writing — Python 3.11.4 documentation

with open('data/src/sample_linebreak.csv', newline='') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)
# ['1', '2', '3']
# ['a\nb', 'x', 'y']

Headers

csv.writer() has no special feature for adding a header row or an index column.

To add a header row, you simply write it first with writerow(). For an index column, add an element to the start of each list being written.

l = [[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]]

header = ['', 'a', 'b', 'c', 'd']
index = ['ONE', 'TWO', 'THREE']

with open('data/temp/sample_writer_header_index.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerow(header)
    for i, row in zip(index, l):
        writer.writerow([i] + row)

with open('data/temp/sample_writer_header_index.csv') as f:
    print(f.read())
# ,a,b,c,d
# ONE,11,12,13,14
# TWO,21,22,23,24
# THREE,31,32,33,34

For such operations, it's easier to use pandas, which will be mentioned later.

Write dictionaries: csv.DictWriter()

While csv.writer() writes a list to each row, csv.DictWriter() writes a dictionary to each row.

Specify the list of dictionary keys you want to write for the second argument fieldnames of csv.DictWriter(). The writeheader() method writes the field names as a header.

Pass a dictionary to writerow() to write rows.

d1 = {'a': 1, 'b': 2, 'c': 3}
d2 = {'a': 10, 'c': 30}

with open('data/temp/sample_dictwriter.csv', 'w') as f:
    writer = csv.DictWriter(f, ['a', 'b', 'c'])
    writer.writeheader()
    writer.writerow(d1)
    writer.writerow(d2)

with open('data/temp/sample_dictwriter.csv') as f:
    print(f.read())
# a,b,c
# 1,2,3
# 10,,30

You can also write a list of dictionaries using writerows().

with open('data/temp/sample_dictwriter_list.csv', 'w') as f:
    writer = csv.DictWriter(f, ['a', 'b', 'c'])
    writer.writeheader()
    writer.writerows([d1, d2])

with open('data/temp/sample_dictwriter_list.csv') as f:
    print(f.read())
# a,b,c
# 1,2,3
# 10,,30

As shown in the example above, if a dictionary with no matching key is written, that element is skipped (i.e., nothing is written).

If you write a dictionary with keys not included in 'fieldnames', you can set the extrasaction argument to 'ignore' to avoid writing those key elements.

with open('data/temp/sample_dictwriter_ignore.csv', 'w') as f:
    writer = csv.DictWriter(f, ['a', 'c'], extrasaction='ignore')
    writer.writeheader()
    writer.writerows([d1, d2])

with open('data/temp/sample_dictwriter_ignore.csv') as f:
    print(f.read())
# a,c
# 1,3
# 10,30

The default is extrasaction='raise', which throws a ValueError.

Read and write CSV files with NumPy

The third-party library NumPy offers more flexible manipulation of multidimensional arrays. It's also possible to convert between NumPy arrays (numpy.ndarray) and Python built-in lists.

Read a CSV file:

import numpy as np

with open('data/src/sample.csv') as f:
    print(f.read())
# 11,12,13,14
# 21,22,23,24
# 31,32,33,34

a = np.loadtxt('data/src/sample.csv', delimiter=',')
print(type(a))
# <class 'numpy.ndarray'>

print(a)
# [[11. 12. 13. 14.]
#  [21. 22. 23. 24.]
#  [31. 32. 33. 34.]]
source: csv_numpy.py

Extract a specific range:

print(a[1:, :2])
# [[21. 22.]
#  [31. 32.]]
source: csv_numpy.py

Calculate the average over all elements and column-wise sums:

print(a.mean())
# 22.5

print(a.sum(axis=0))
# [63. 66. 69. 72.]
source: csv_numpy.py

Read and write CSV files with pandas

The third-party library pandas allows more flexible manipulation of mixed numeric and string data. It's also possible to convert between pandas.DataFrame and Python built-in lists.

With pandas, reading and writing CSV files is also easy. Refer to the following articles for details.

Read a CSV file:

import pandas as pd

with open('data/src/sample_pandas_normal.csv') as f:
    print(f.read())
# name,age,state,point
# Alice,24,NY,64
# Bob,42,CA,92
# Charlie,18,CA,70
# Dave,68,TX,70
# Ellen,24,CA,88
# Frank,30,NY,57

df = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0)
print(type(df))
# <class 'pandas.core.frame.DataFrame'>

print(df)
#          age state  point
# name                     
# Alice     24    NY     64
# Bob       42    CA     92
# Charlie   18    CA     70
# Dave      68    TX     70
# Ellen     24    CA     88
# Frank     30    NY     57

Select columns or elements:

print(df['age'])
# name
# Alice      24
# Bob        42
# Charlie    18
# Dave       68
# Ellen      24
# Frank      30
# Name: age, dtype: int64

print(df.at['Bob', 'age'])
# 42

Extract rows by specifying conditions:

print(df.query('state == "NY"'))
#        age state  point
# name                   
# Alice   24    NY     64
# Frank   30    NY     57

print(df.query('age > 30'))
#       age state  point
# name                  
# Bob    42    CA     92
# Dave   68    TX     70

Calculate summary statistics:

print(df.describe())
#              age      point
# count   6.000000   6.000000
# mean   34.333333  73.500000
# std    18.392027  13.707662
# min    18.000000  57.000000
# 25%    24.000000  65.500000
# 50%    27.000000  70.000000
# 75%    39.000000  83.500000
# max    68.000000  92.000000

Thus, pandas efficiently processes tabular data, especially those with mixed numerical and textual columns. For other articles related to pandas, see the link below:

Related Categories

Related Articles