How to Read/Write CSV Files with NumPy?


CSV files are a common way of storing tabular data. If you have a file with data in this format, you can use NumPy’s .loadtxt(), .genfromtxt() functions to read the data into a NumPy array or use .savetxt() to write to CSV file.

What is NumPy?

NumPy is a Python library that provides efficient methods for reading and writing large amounts of data. When working with data in NumPy arrays, we often need to save our data to disk in CSV format so that we can share it with others or load it into another Python program.

What is a CSV file?

A CSV file is a Comma Separated Values file. The name comes from the fact that the contents of the file are separated by commas. They are sometimes called Character Separated Values files as well. CSV files are a very popular way of storing data and they are often used in spreadsheets and databases.

Here’s an example of what a CSV file might look like:

Name,Age,Gender,Country John,21,Male,United States Jane,23,Female,Canada Mary,25,Female,United Kingdom

In this example, the columns are Name, Age, Gender, and Country. The rows are John, Jane, and Mary. And each row is divided into columns by a comma.

How to read CSV files with NumPy?

The simplest way to read data from a CSV file (NumPy’s .loadtxt() Function)

The .loadtxt() function is the simplest way to read data from a CSV file into a NumPy array. The function takes two arguments: the name of the file to be read, and an optional delimiter string.

If no delimiter is specified, the .loadtxt() function will assume that the fields in the file are separated by whitespace.

Here’s an example of how to use the .loadtxt() function to read a CSV file:

import numpy as np data = np.loadtxt('data.csv', delimiter=',')
Code language: Python (python)

In this example, we’ve stored the contents of the file in a 2D NumPy array called data. The first row of the array contains the field names, and each subsequent row contains the data for one record.

Take for example the following file with numbers separated by spaces. delimiter = ‘ ‘

Python Script 1: read the space-separated file.

example1.txt file contains:

# 11 12 13 14 # 21 22 23 24 # 31 32 33 34
Code language: PHP (php)
import numpy as np x = np.loadtxt('users/david/example1.txt') print(type(x)) # <class 'numpy.ndarray'> print(x) # [[11. 12. 13. 14.] # [21. 22. 23. 24.] # [31. 32. 33. 34.]] print(x.dtype) # float64
Code language: Python (python)

The first argument is the file path, which returns ndarray. By default the “dtype” data type is float.

Python Script 2: Specify delimiter (argument delimiter)

Take a comma-separated file (CSV file or text) as an example. example2.txt

# 11,12,13,14 # 21,22,23,24 # 31,32,33,34
Code language: PHP (php)

To read this file you must specify a comma in a string as an argument. delimiter = “,”.

print(np.loadtxt('users/david/example2.txt', delimiter=',')) # [[11. 12. 13. 14.] # [21. 22. 23. 24.] # [31. 32. 33. 34.]]
Code language: Python (python)

In the case of .TSV file (tab-delimited), you set delimiter = ‘\t’ as it should be.

Python Script 3: you can specify data type for np.loadtxt() function. You use Argument: dtype

In Python Script 1 mentioned above, the default data type (dtype) is float (the number of bits depends on the environment).

You can specify any data type for the dtype argument. In this example. I set dtype = int64.

import numpy as np x = np.loadtxt('users/david/example2.txt', delimiter=',', dtype='int64') print(x) # [[11 12 13 14] # [21 22 23 24] # [31 32 33 34]] print(x.dtype) # int64
Code language: Python (python)

Python Script 4: You can specify rows and columns to read with skiprows, usecols arguments.

You can specify which row and column to read by using the skiprows, usecols argument if the data being read contains any unneeded information.

The skiprows argument specifies the number of lines to skip. you set skiprows with an integer value.

The usecols argument indicates the column to be read as a list. When reading one column, use an integer.

example4.csv contains:

# ,h1,h2,h3,h4 # A,11,12,13,14 # B,21,22,23,24 # C,31,32,33,34
Code language: PHP (php)

If you specify the skiprows, usecols argument you can read only numeric data excluding character strings.

import numpy as np x = np.loadtxt('data/src/sample_header_index.csv', delimiter=',', dtype='int64', skiprows=1, usecols=[1, 2, 3, 4]) print(x) # [[11 12 13 14] # [21 22 23 24] # [31 32 33 34]]
Code language: Python (python)

If you’re using NumPy version 1.7 or lower, you’ll need to specify the dtype argument when calling the loadtxt() function.

This argument allows you to specify what data type each field should be interpreted as.

For example, if all of the fields in your CSV file are floats, you could use the following code to read the data:

data = np.loadtxt('data.csv', delimiter=',', dtype=float64)
Code language: JavaScript (javascript)

If you’re using NumPy version 1.8 or higher, you can omit the dtype argument because it will automatically be inferred from the data in the file.

Read CSV files with more complex structures ( .genfromtxt() Function)

The .genfromtxt() function is similar to .loadtxt(), but it has additional features that make it more flexible for handling different types of data files. For example, .genfromtxt() can automatically handle missing values in your data.

.genfromtxt() Function allows you to read CSV files with more complex structures, including missing values ​​and multiple different data types.

Python script 5: How to handle missing values

example5.txt file contains:

# 11,12,,14 # 21,,,24 # 31,32,33,34
Code language: PHP (php)

you use .genfromtxt() Function, missing values ​​are setted nan.

x = np.genfromtxt('users/david/example5.txt', delimiter=',') print(x) # [[11. 12. nan 14.] # [21. nan nan 24.] # [31. 32. 33. 34.]] print(x[0, 2]) # nan print(type(x[0, 2])) # <class 'numpy.float64'>
Code language: PHP (php)

Python script 6: How to Handle different data types

Let’s take a file with different data types (strings and numbers) for each column as shown below.

# name,age,state,point # Andy,24,NY,54 # Marry,42,CA,62 # Charlie,18,CA,80 # Dave,68,TX,71 # Ellen,24,CA,86 # David,30,NY,58
Code language: PHP (php)

If the names argument is True and the dtype argument is None. np.genfromtxt() function knows that The value in the first row is the field name, and it is read as a structured array in which the type is automatically determined for each column.

x = np.genfromtxt('data/src/sample_pandas_normal.csv', delimiter=',', names=True, dtype=None, encoding='utf-8') print(type(x)) # <class 'numpy.ndarray'> print(x) # [('Andy', 24, 'NY', 564) ('Marry', 42, 'CA', 62) ('Charlie', 18, 'CA', 80) # ('Dave', 68, 'TX', 71) ('Ellen', 24, 'CA', 86) ('David', 30, 'NY', 58)] print(x.dtype) # [('name', '<U7'), ('age', '<i8'), ('state', '<U2'), ('point', '<i8')]
Code language: Python (python)

How to Write a CSV File with NumPy?

Writing a CSV file with NumPy is easy. We just have to use the np.savetxt() function. This function takes two arguments: the name of the file we want to create (including the path) and the data we want to write to the file.

Here’s an example:

import numpy as np data = np.array([1, 2, 3]) np.savetxt('test.csv', data)
Code language: Python (python)

This code creates a file called test.csv in the current working directory and writes the contents of the NumPy array a to the file.

If you open test.csv in a text editor like Notepad or TextEdit, you will see the following:

1 2 3

You can also specify the delimiter (the character used to separate values in the file) using the delimiter argument:

import numpy as np data = np.array([1, 2, 3]) np.savetxt('test2.csv', a, delimiter=',')
Code language: Python (python)

If you open test2.csv in a text editor, you will see the following:

1,2,3

You can also specify the header (the first line of text written to the file) using the header argument:

import numpy as np data = np.array([1, 2, 3]) np.savetxt('test3.csv', a, header='My header')
Code language: Python (python)

If you open test3, you will see the following:

My header 1 2 3

As you can see, writing CSV files with NumPy is easy! Just remember to import NumPy first and then use the functions.

Python Script 7: using fmt Argument to specify format.

A example_save.txt file with the following contents is created.

# 0.000000000000000000e+00 1.000000000000000000e+00 2.000000000000000000e+00 # 3.000000000000000000e+00 4.000000000000000000e+00 5.000000000000000000e+00
Code language: CSS (css)

The default is ‘%.18e’, written in exponential notation with 18 decimal places, as described above. “.” The following numbers represent the number of digits after the decimal point, and the numbers “e” represents exponential notation.

np.savetxt('users/david/example_save.txt', a, fmt='%.5e') with open('users/david/example_save.txt') as f: print(f.read()) # 0.00000e+00 1.00000e+00 2.00000e+00 # 3.00000e+00 4.00000e+00 5.00000e+00 np.savetxt('users/david/example_save.txt', a, fmt='%.5f') with open('users/david/example_save.txt') as f: print(f.read()) # 0.00000 1.00000 2.00000 # 3.00000 4.00000 5.00000 np.savetxt('users/david/example_save.txt', a, fmt='%d') with open('users/david/example_save.txt') as f: print(f.read()) # 0 1 2 # 3 4 5
Code language: PHP (php)

Read more

In this blog post, we’ve shown you how to use NumPy to read/write a CSV file. We started by explaining what a CSV file is and how it’s structured. Then we showed you how to use NumPy’s .loadtxt(), .genfromtxt() functions to read the data into a NumPy array and use the np.savetxt() function to write data from NumPy arrays into CSV files. Give it a try yourself and see how easy it is!

Andy Avery

I really enjoy helping people with their tech problems to make life easier, ​and that’s what I’ve been doing professionally for the past decade.

Recent Posts