Reading and Writing CSV Files in Python Using the CSV Module and Pandas

Posted in /  

Reading and Writing CSV Files in Python Using the CSV Module and Pandas
vinaykhatri

Vinay Khatri
Last updated on April 19, 2024

    Python provides many ways for reading and writing data to CSV files. Among all the different ways to read a CSV file in Python, the standard csv module and pandas library provide simplistic and straightforward methods. As with a simple text file, we can also use Python file handling and the open() method to read a CSV file in Python.

    In this Python tutorial, we will walk discuss how to use the CSV module and Pandas library for reading and writing data to CSV files. And by the end of this tutorial, you will have a solid idea about what is a CSV file and how to handle CSV files in Python. So, let's start.

    What is a CSV File?

    A CSV, a.k.a. Comma Separated Values file, is a simple text file. It has the .csv file extension and hence, the name. But unlike a text file, the data inside the CSV file must be organized in a specific format. The data in the CSV file should be stored in a tabular format, and as the name suggests, the data values inside the CSV files must be separated by commas. Like tabular data of relational databases , every row or line of the CSV file represents a record, and every column represents a specific data field. Consider the following example of a CSV file:

    #movies.csv

    movieId,title,genres
    1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
    2,Jumanji (1995),Adventure|Children|Fantasy
    3,Grumpier Old Men (1995),Comedy|Romance
    4,Waiting to Exhale (1995),Comedy|Drama|Romance
    5,Father of the Bride Part II (1995),Comedy
    6,Heat (1995),Action|Crime|Thriller
    7,Sabrina (1995),Comedy|Romance

    A CSV file can also be opened using MS Excel , and there you can see a proper representation of the CSV data.

    From the above movies.csv file, you can see that every data value in a column is separated with a comma, and every new record is terminated with a new line. Next, let's discuss how we can read and write data in a CSV file in Python.

    Python CSV Module

    Python comes with a powerful standard CSV module for reading and writing CSV files. To use the dedicated csv module, we have to import it first using the following Python import statement:

    import csv

    Create a CSV file in Python and Write Data

    Let's start by creating a CSV file using Python and writing some data in it. Although we can simply use the Python file handling write() method to write data in a CSV file, here we will be using csv.writer() and csv.writerow() methods to write data row by row.

    Example: Write a CSV File in Python

    import csv
    
    #open or create file
    with open("movies.csv", 'w', newline="") as file:
        writer = csv.writer(file)
        
        #write data
        writer.writerow(["movieId", "title", "genres"])
        writer.writerow(["1","Toy Story (1995)","Adventure|Animation|Children|Comedy|Fantasy"])
        writer.writerow(["2","Jumanji (1995)","Adventure|Children|Fantasy"])
        writer.writerow(["3","Grumpier Old Men (1995)","Comedy|Romance"])
        writer.writerow(["4","Waiting to Exhale (1995)","Comedy|Drama|Romance"])

    From the above example you can see that in order to write a CSV file in Python, first you need to open it using the open() method. When you execute the above program, it will create a movies.csv file in the same directory where your Python script is located.

    #movies.csv

    movieId,title,genres
    1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
    2,Jumanji (1995),Adventure|Children|Fantasy
    3,Grumpier Old Men (1995),Comedy|Romance
    4,Waiting to Exhale (1995),Comedy|Drama|Romance

    In the above example, you can see that when we open the file using the open("movies.csv", 'w', newline="") statement, we also specify the newline ="" parameter, and it specifies that there should be no newline gap between two records.

    Write CSV Data in Python Using the writerows() Method

    In the above example, we write data in our movies.csv file using the writerow() method. When we use the writerow() method to write the data, we have to use it multiple times because it writes data row by row. However, there is a better way to do it. The csv.writer() module also provides the writer.writerows() method, which can write multiple data rows in the CSV file with just one call.

    Python Example:

    Write Multiple Rows in a csv File with writerows()

    Let's continue with our above example and append new rows of movie data in our movies.csv file using the writer.writerows() method.

    import csv
    
    movies_rows = [
                    ["5","Father of the Bride Part II (1995)","Comedy"],
                    ["6","Heat (1995)","Action|Crime|Thriller"],
                    ["7","Sabrina (1995)","Comedy|Romance"]
                   ]
    
    #append data to movies.csv
    with open("movies.csv", 'a', newline="") as file:
        writer = csv.writer(file)
        
        #write multiple rows
        writer.writerows(movies_rows)

    In this example, we append new data to our movies.csv file by opening the file in the "a" append mode, and when you execute this program, your movies.csv file will be populated with 3 more rows.

    movieId,title,genres
    1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
    2,Jumanji (1995),Adventure|Children|Fantasy
    3,Grumpier Old Men (1995),Comedy|Romance
    4,Waiting to Exhale (1995),Comedy|Drama|Romance
    5,Father of the Bride Part II (1995),Comedy
    6,Heat (1995),Action|Crime|Thriller
    7,Sabrina (1995),Comedy|Romance

    Note: The default delimiter of csv.writer() is the comma, which makes sense for the comma-separated values file, but if you want to set the delimiter to some other symbol like $, > or <, then you can specify the delimiter parameter to the writer() method. writer = csv.writer(file, delimiter= ">")

    Python CSV Read Data

    Now that you know how to write data in a CSV file, let's discuss how you can read data from the CSV file using the Python csv module. To parse a CSV file in Python or to read data from a CSV file, we can use the csv.reader() method. In the above examples, we created a movies.csv file and wrote some data in it. Now, let's read the data from the same movies.csv file.

    Example:

    Python Parse CSV File and Read Data Using csv.reader()

    The csv.reader() method parses the CSV file in Python and returns a reader iterable object. It is a list of rows data separated with commas, and like other iterable objects, we can use Python for loop to iterate over the returned value of the reader() method.

    import csv
    
    #open movies.csv file to read
    with open("movies.csv", 'r') as file:
        rows = csv.reader(file)
        
        for row in rows:
            print(row)

    Output

    ['movieId', 'title', 'genres']
    ['1', 'Toy Story (1995)', 'Adventure|Animation|Children|Comedy|Fantasy']
    ['2', 'Jumanji (1995)', 'Adventure|Children|Fantasy']
    ['3', 'Grumpier Old Men (1995)', 'Comedy|Romance']
    ['4', 'Waiting to Exhale (1995)', 'Comedy|Drama|Romance']
    ['5', 'Father of the Bride Part II (1995)', 'Comedy']
    ['6', 'Heat (1995)', 'Action|Crime|Thriller']
    ['7', 'Sabrina (1995)', 'Comedy|Romance']

    Note: By default, the csv.reader() method reads the csv file based on the comma (,) delimiter. If your CSV file has a different delimiter like >, \t, >, $, @, and so on, you can explicitly specify the delimiter parameter to the reader method.

    rows = csv.reader(file, delimiter=">")

    Parse the CSV File to Dict in Python

    The Python CSV module provides the csv.DictReader() method, which can parse the CSV file to a Python dictionary. The csv.DictReader() method returns a DictReader iterable object, which contains dictionary objects of the columns:data pair.

    Example

    import csv
    
    #open movies.csv file to read
    with open("movies.csv", 'r') as file:
        
        rows = csv.DictReader(file)
        
        for row in rows:
            print(row)

    Output

    {'movieId': '1', 'title': 'Toy Story (1995)', 'genres': 'Adventure|Animation|Children|Comedy|Fantasy'}
    {'movieId': '2', 'title': 'Jumanji (1995)', 'genres': 'Adventure|Children|Fantasy'}
    {'movieId': '3', 'title': 'Grumpier Old Men (1995)', 'genres': 'Comedy|Romance'}
    {'movieId': '4', 'title': 'Waiting to Exhale (1995)', 'genres': 'Comedy|Drama|Romance'}
    {'movieId': '5', 'title': 'Father of the Bride Part II (1995)', 'genres': 'Comedy'}
    {'movieId': '6', 'title': 'Heat (1995)', 'genres': 'Action|Crime|Thriller'}
    {'movieId': '7', 'title': 'Sabrina (1995)', 'genres': 'Comedy|Romance'}

    Reading and Writing CSV Files in Python Using the Pandas Library

    pandas is one of the most powerful Python libraries for data science. It comes with many built-in methods and features, and it is widely used for data manipulation and analysis. Using this library, we can write data in different file formats, including CSV. But in this Python tutorial, we will only be discussing writing and reading CSV files using Pandas. Unlike the Python csv module, pandas does not come pre-installed with Python. Therefore, before using the pandas library, make sure you have installed it. Installing the pandas library is very easy, and with the following Python pip install command, you can install pandas for your Python environment:

    pip install pandas

    Write a CSV File with the Pandas to_csv() Method

    Creating or writing data in CSV files in Python using pandas is a bit tricky as compared to the Python csv module. That's because before creating a CSV file and writing data into it, we have to create a Pandas DataFrame. A pandas DataFrame can be understood as an n-dimensional array with rows and columns.

    Example

    import pandas as pd
    
    #2d array of movies
    movies_rows = [
            ['1', 'Toy Story (1995)', 'Adventure|Animation|Children|Comedy|Fantasy'],
            ['2', 'Jumanji (1995)', 'Adventure|Children|Fantasy'],
            ['3', 'Grumpier Old Men (1995)', 'Comedy|Romance'],
            ['4', 'Waiting to Exhale (1995)', 'Comedy|Drama|Romance'],
            ['5', 'Father of the Bride Part II (1995)', 'Comedy'],
            ['6', 'Heat (1995)', 'Action|Crime|Thriller'],
            ['7', 'Sabrina (1995)', 'Comedy|Romance'],
                 ]
    
    heading = ['movieId', 'title', 'genres']
    
    #pandas dataframe
    movies = pd. DataFrame(movies_rows, columns= heading )
    
    #create the movies.csv file from dataframe
    movies.to_csv("movies.csv")

    This will create a movies.csv file in the same directory where your python script is located.

    ,movieId,title,genres
    0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
    1,2,Jumanji (1995),Adventure|Children|Fantasy
    2,3,Grumpier Old Men (1995),Comedy|Romance
    3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
    4,5,Father of the Bride Part II (1995),Comedy
    5,6,Heat (1995),Action|Crime|Thriller
    6,7,Sabrina (1995),Comedy|Romance

    Read from a CSV File in Python Using the pandas read_csv() Method

    To read the CSV file in Python using pandas , we need to use the pd.read_csv() method. The read_csv() method accepts the CSV file name as a parameter and creates a Python pandas DataFrame.

    Example:

    import pandas as pd
    
    df = pd.read_csv("movies.csv")
    
    print(df)

    Output

     Unnamed: 0 ... genres
    0 0 ... Adventure|Animation|Children|Comedy|Fantasy
    1 1 ... Adventure|Children|Fantasy
    2 2 ... Comedy|Romance
    3 3 ... Comedy|Drama|Romance
    4 4 ... Comedy
    5 5 ... Action|Crime|Thriller
    6 6 ... Comedy|Romance

    Conclusion

    If you just want to parse CSV files for reading and writing data, then you should use the Python Standard CSV module because using pandas for simple read and write file operations could be a high-performance task. To write data in a csv file using the standard csv module, we can use the writer() method along with the writerow() method. Also, to read data from the CSV file, we can use the csv.reader() method. In pandas, we first create a DataFrame and then write its data in the CSV file by using the to_csv() method, and to read data from the CSV file using pandas, we use the Pandas DataFrame read_csv() method.

    People are also reading:

    Leave a Comment on this Post

    0 Comments