How to Import an Excel File into Python using Pandas?

By | September 27, 2021
How to Import an Excel File into Python using Pandas

Python has many data science libraries and Pandas is one of Python’s most popular data science libraries. Like the NumPy library arrays and ndArrays Python Pandas support Series and DataFrames. Here Series represents 1D array and DataFrames represents multi-dimensional arrays.

An excel sheet can be represented as a DataFrame, and the pandas library provides us an inbuilt read_excel() method that can be used to achieve this goal. In this short Python tutorial, you will learn how can you import an Excel sheet in Python using pandas (with the read_excel() method).

Before you import an excel sheet in Python using pandas make sure that pandas in installed in your system.

Python Libraries Required to Import an Excel File in Python

There are 3 libraries you need to install in your python environment if you want to import an excel sheet using pandas.

  1. numpy
  2. pandas
  3. xlrd

pip install numpy

Vamware

pip install pandas

pip install xlrd

Vamware

Make sure that you have installed these three libraries before importing an Excel sheet in Python with Pandas, else you will be getting this error:

ImportError: Missing optional dependency ‘xlrd’. Install xlrd >= 1.0.0 for Excel support Use pip or conda to install xlrd.

How to Import an Excel File into Python Using Pandas?

read_excel() is a pandas method that allows us to access an Excel sheet using Python. The read_excel() method can load the Excel file from the local system or specified URL and the read_excel() method allows us to access Excel files with extension xls, xlsx, xlsm, xlsb, odf, ods and odt.

For the example, below we have used the countries.xlsx file:

Python Program to Import an Excel File Using pandas

import pandas as pd
#access the excel sheet using python
df = pd.read_excel('countries.xlsx')
print(df)
print("The type of df is:",type(df) )

Output

CountryCode            ShortName            TableName                                 LongName

0         AFG          Afghanistan          Afghanistan             Islamic State of Afghanistan

1         ALB              Albania              Albania                      Republic of Albania

2         DZA              Algeria              Algeria  People's Democratic Republic of Algeria

3         ASM       American Samoa       American Samoa                           American Samoa

4         ADO              Andorra              Andorra                  Principality of Andorra

5         AGO               Angola               Angola              People's Republic of Angola

6         ATG  Antigua and Barbuda  Antigua and Barbuda                      Antigua and Barbuda

7         ARB           Arab World           Arab World                               Arab World

8         ARG            Argentina            Argentina                       Argentine Republic

9         ARM              Armenia              Armenia                      Republic of Armenia

The type of df is: <class 'pandas.core.frame.DataFrame'>

Behind the code:

From the above example, you can see that the read_excel() method imports the countries.xlsx file and convert it into a Pandas Dataframe object. Moreover, it converts the first row of the excel sheet to the columns named.

In the above example, the python script and the excel file are at the same location that’s why we are directly able to access the file using the file name. If the excel file and python script are located at different locations then you need to specify the path location of the excel file.

Pandas read_excel() Method Arguments

The read_excel() method accepts multiple arguments, and most of the arguments are optional, except the file name.

read_excel() important arguments

  • io
  • header
  • name
  • index_col

io represents the file name and it is the mandatory argument. It is represented by a string value that specifies the path for the excel file:

df = pd.read_excel(io ='countries.xlsx')

header specifies the head value for the data frame and by default, its value is 0, which represents that row 0 will be the header value for the data frame. If we set it to None then indexing values will be used as headers.

import pandas as pd
#access the excel sheet using python
df = pd.read_excel(io ='countries.xlsx', header = None)

print(df)
print("The type of df is:",type(df) )

Output

              0                    1                    2                                        3

0   CountryCode            ShortName            TableName                                 LongName

1           AFG          Afghanistan          Afghanistan             Islamic State of Afghanistan

2           ALB              Albania              Albania                      Republic of Albania

3           DZA              Algeria              Algeria  People's Democratic Republic of Algeria

4           ASM       American Samoa       American Samoa                           American Samoa

5           ADO              Andorra              Andorra                  Principality of Andorra

6           AGO               Angola               Angola              People's Republic of Angola

7           ATG  Antigua and Barbuda  Antigua and Barbuda                      Antigua and Barbuda

8           ARB           Arab World           Arab World                               Arab World

9           ARG            Argentina            Argentina                       Argentine Republic

10          ARM              Armenia              Armenia                      Republic of Armenia

The type of df is: <class 'pandas.core.frame.DataFrame'>

The names represent a list of  values that specifies the header for the data frame if header value is None.

import pandas as pd

#access the excel sheet using python
df = pd.read_excel(io ='countries.xlsx', header = None, names=["One", "Two", "Three", "Four"])

print(df)

Output

            One                  Two                Three                                     Four

0   CountryCode            ShortName            TableName                                 LongName

1           AFG          Afghanistan          Afghanistan             Islamic State of Afghanistan

2           ALB              Albania              Albania                      Republic of Albania

3           DZA              Algeria              Algeria  People's Democratic Republic of Algeria

4           ASM       American Samoa       American Samoa                           American Samoa

5           ADO              Andorra              Andorra                  Principality of Andorra

6           AGO               Angola               Angola              People's Republic of Angola

7           ATG  Antigua and Barbuda  Antigua and Barbuda                      Antigua and Barbuda

8           ARB           Arab World           Arab World                               Arab World

9           ARG            Argentina            Argentina                       Argentine Republic

10          ARM              Armenia              Armenia                      Republic of Armenia

index_col argument represents the first row labels of the data frame.

import pandas as pd

#access the excel sheet using python
df = pd.read_excel(io ='countries.xlsx', index_col="TableName")

print(df)

Output

                    CountryCode            ShortName                                 LongName
TableName
Afghanistan                 AFG          Afghanistan             Islamic State of Afghanistan
Albania                     ALB              Albania                      Republic of Albania
Algeria                     DZA              Algeria  People's Democratic Republic of Algeria
American Samoa              ASM       American Samoa                           American Samoa
Andorra                     ADO              Andorra                  Principality of Andorra
Angola                      AGO               Angola              People's Republic of Angola
Antigua and Barbuda         ATG  Antigua and Barbuda                      Antigua and Barbuda
Arab World                  ARB           Arab World                               Arab World
Argentina                   ARG            Argentina                       Argentine Republic
Armenia                     ARM              Armenia                      Republic of Armenia

Conclusion

Here in this Python tutorial, you learned how can you import an Excel file in python using pandas with the read_excel() method. Pandas also provide various methods, such as read_table, read_csv, read_json, and read_html to read and import tables, CSV, JSON, and HTML files, respectively.

Before you use the read_excel() method in Python, ensure that all the other dependencies (NumPy and xlrd) have been installed in your Python environment.

People are also reading:

Author: Vinay

I am a Full Stack Developer with a Bachelor's Degree in Computer Science, who also loves to write technical articles that can help fellow developers.

Leave a Reply

Your email address will not be published.