How to read an Excel file (.xls or .xlsx) with pandas?


Whether you are a business owner or a data analyst, there will come a time when you will need to read an Excel file. This is where pandas come in.

Pandas is a Python library that allows you to manipulate and analyze data. In this blog post, we will show you how to read an Excel file using pandas.

You use pandas.read_excel() function to read an Excel file (extension: .xlsx, .xls) pandas. DataFrame as pandas.

pandas.read_excel() function uses the libraries openpyxl and xlrd internally.

so we also need to install openpyxl and xlrd for our script. openpyxl and xlrd are Python libraries for reading and writing Excel files ( .xls, xlsx ).

How to Install Pandas, openpyxl, xlrd

Before we get started, we need to install pandas. You can do this by using the pip command. If you don’t have pip installed, you can follow the instructions here. Once you have pip installed, open your terminal and type the following:

$ pip install pandas $ pip install openpyxl $ pip install xlrd

You are easy to install openpyxl and xlrd with pip.

How to Read an Excel File Using Pandas

Now that we have pandas installed, we can start reading our Excel file. We will start by importing the pandas library. We can do this by typing the following into our Python script:

import pandas as pd
Code language: JavaScript (javascript)

We will also need to import the os library so that we can specify the path of our Excel file. The os library is a built-in Python library, so we do not need to install it separately. We can import it by typing the following:

import os
Code language: JavaScript (javascript)

Now that we have imported the libraries we need, we can specify the path of our Excel file. In this example, our Excel file is in the same directory as our Python script. We will use the os.path.join function to join together the directory path and the name of our Excel file.

We will store this in a variable called path so that we can use it later on in our script. Type the following into your script:

path = os.path.join(".", "example.xlsx")
Code language: JavaScript (javascript)

Note: You need to unzip to use.

example.xlsx Excel File has two sheets:

sheet1

A B C one 11 12 13 two 21 22 23 three 31 32 33

sheet2

AA BB CC ONE 11 12 13 TWO 21 22 23 THREE 31 32 33

If your Excel file is not in the same directory as your Python script, you will need to specify the full path of your file here (eg. “/Users/username/Documents/example.xlsx”)

If there are more than one page, only the first page will be read.

import pandas as pd import os print(pd.__version__) # 1.5.0 # Get the Excel file in the same directory as our Python script path = os.path.join(".", "example.xlsx") df = pd.read_excel(path, index_col=0) # trying to make the first row index_col=0 print(df) # Output: # A B C # one 11 12 13 # two 21 22 23 # three 31 32 33 print(type(df)) # <class 'pandas.core.frame.DataFrame'>
Code language: Python (python)

The .xlsx file for (Excel file of Excel2007 or later).
The .xls file for (Excel file of Excel97-2003).

You can specify the sheet to read by number or by name with the argument: sheet_name

This is an example to load one sheet:

import pandas as pd import os path = os.path.join(".", "example.xlsx") df_sheet_number = pd.read_excel(path, sheet_name=0, index_col=0) print(df_sheet_number) # Output: # A B C # one 11 12 13 # two 21 22 23 # three 31 32 33 df_sheet_name = pd.read_excel(path, sheet_name='sheet2', index_col=0) print(df_sheet_name) # Output: # AA BB CC # ONE 11 12 13 # TWO 21 22 23 # THREE 31 32 33
Code language: Python (python)

You can load multiple sheets by using the sheet_name argument as a list. Numbers starting with 0 or sheet names.

For example: to read two sheets: sheet_name = [“sheet1”, 1]

A return result is a dictionary, the specified number or sheet name is the key and the sheet data (pandas.DataFrame) is the value.

df_sheet_multi = pd.read_excel(path, sheet_name=["sheet1", 1], index_col=0) print(type(df_sheet_multi)) # <class 'dict'> print(len(df_sheet_multi)) # 2 print(df_sheet_multi.keys()) # dict_keys(["sheet1", 1])
Code language: Python (python)

At the time of reading, the key of the sheet specified by the number becomes the number, and the key of the sheet specified by the name becomes the name.

You can load all sheets in an excel file

You set sheet_name=None.  All sheets in an excel file are loaded. In this case, the sheet name is the key of dict.

load_all_sheets = pd.read_excel(path, sheet_name=None, index_col=0) print(type(load_all_sheets)) # <class 'dict'> print(load_all_sheets.keys()) # dict_keys(['sheet1', 'sheet2'])
Code language: PHP (php)

Arguments header, index_col

header = None and index_col = None: a specific row or column will not be used for the header or index, and will be a sequential number starting with 0.

df_header_none = pd.read_excel(path, header=None, index_col=None) print(df_header_none) # 0 1 2 3 # 0 NaN A B C # 1 one 11 12 13 # 2 two 21 22 23 # 3 three 31 32 33 print(df_header_none.columns) # Int64Index([0, 1, 2, 3], dtype='int64') print(df_header_none.index) # RangeIndex(start=0, stop=4, step=1)
Code language: PHP (php)

The default is header=0 (=first row), index_col=None(=no column specified).

df_default = pd.read_excel(path) print(df_default) # Unnamed: 0 A B C # 0 one 11 12 13 # 1 two 21 22 23 # 2 three 31 32 33 print(df_default.columns) # Index(['Unnamed: 0', 'A', 'B', 'C'], dtype='object') print(df_default.index) # RangeIndex(start=0, stop=3, step=1)
Code language: PHP (php)

If you want to use the top row, you can set it explicitly (index_col=0) and it is safe in any version.

print(pd.read_excel(path, index_col=0)) # A B C # one 11 12 13 # two 21 22 23 # three 31 32 33
Code language: PHP (php)

If you don’t need to read all rows/columns, you can specify columns to read and rows not to read. (Arguments: usecols, skiprows, skipfooter)

  • usecols: a list of column numbers to read as an argument.
  • skiprows: a list of row numbers to skip (not read) as an argument.
  • skipfooter: the number of last rows to be skipped (not read) as an argument.
df_use_skip = pd.read_excel(path, index_col=0, usecols=[0, 1, 3], skiprows=[1], skipfooter=1) print(df_use_skip) # A C # two 21 23
Code language: Python (python)

You can use Python and pandas to read any Excel file (.xls or .xlsx). I hope this was helpful and that you now feel empowered to tackle those pesky Excel files!

Andy Avery

I really enjoy helping people with their tech problems to make life easier, ​and that’s what I’ve been doing professionally for the past decade.

Recent Posts