Python Histogram Plotting Using Matplotlib, Pandas, and Seaborn

Posted in /  

Python Histogram Plotting Using Matplotlib, Pandas, and Seaborn
vinaykhatri

Vinay Khatri
Last updated on April 19, 2024

    Data visualization is an essential skill for every data scientist. The Python programming language has many third-party data visualization libraries. In this tutorial, you will learn Python histogram plotting using Matplotlib, Pandas, and Seaborn. A histogram is a graphical representation of distributed data. It is useful to represent the numerical data destitution with its frequency. It is similar to the bar plot graph, where the X-axis signifies the bin ranges or data distribution and the Y-axis represents the frequency of the data.

    Python Histogram Plotting

    1) Plotting a Histogram Using the Matplotlib Library

    Matplotlib is the standard data visualization library of Python for data science. It is one of the most popular and widely used Python data visualization libraries , and it is compatible with other Python data science libraries like NumPy, scikit-learn, Pandas, and PyTorch. Using Matplotlib, you can create interactive and beautiful graphs. Matplotlib supports a wide range of graphs, including bar plot graphs, pie graphs, scatter graphs, and histograms.

    Create a Histogram with Matplotlib

    The matplotlib library supports an inbuilt method, hist() , which accepts an array of data values and plots a histogram.

    Python Matplotlib hist() Syntax and Parameters

    from matplotlib import pyplot as plt
    
    plt.hist(array, bins, range, density, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, data, **kwargs)

    hist() Method Parameters

    Parameters Description
    array(x) The array of data.
    bins (optional parameter) Integers or sequences or strings. Defines the number of equal-width bins.
    range(optional parameter) Tuple values. Defines the lower and upper range of bins.
    density(optional parameter) (default: False) A Boolean value defines the probability density.
    weights (optional parameter) (default:None) An array value with the same length or shape of array(x). It associates weight with individual array(x) values.
    cumulative (optional parameter) (default:false) A Boolean value if true each bin will give the count.
    bottom (optional parameter) (default:None) An array-like structure. Define the location of the bottom of every bin.
    histype (optional parameter) (default:”bar”) It defines the type of histogram drawn. Other values “barstacked”, “step”, and “stepfilled.”
    align (optional parameter) (default:mid) It defines the horizontal alignment of histogram bars. It could be left, right, or mid.
    orientation (optional parameter) (default: vertical) It defines the orientation of the hist graph. Its value can either be vertical or horizontal.
    rwidth (optional parameter) (default: None) Floating-point values define the relative width of bin bars.
    log (optional parameter) (default: False) The true value will set the histogram axis to the log scale.
    color (optional parameter) (Default: None) It defines the color of bars.
    label (optional parameter) (default: None) String value to match multiple data sets.

    Example

    from matplotlib import pyplot as plt
    
    import numpy as np
    
    students_age = [12, 12, 13, 14, 13, 12, 15 ,15 ,11, 12, 13, 12, 14, 15, 16, 15, 12,13, 14, 16] 
    numpy_array = np.array(students_age)
    
    # Create histogram
    fig, ax = plt.subplots(figsize =(10, 7))  
    ax.hist(numpy_array, range = (8, 20), color="green" )
    
    plt.title("Age Distribution of Class 9")
    plt.xlabel("Students Age")
    plt.ylabel("Frequency")
    
    # Show plot
    plt.show()

    Output

    2) Plotting a Histogram Using the Python Pandas Library

    Python Pandas library is a powerful data science library. It is built on many popular Python libraries like numpy and matplotlib. In Pandas, the 1-D and n-D arrays are defined as Series and DataFrame. And the Panda series and DataFrames come with the histogram module that is inherited from the matplotlib.pyplot.hist() method.

    Create a Histogram with Pandas

    Drawing a histogram using Pandas is very easy and straightforward. You just need to define a Pandas series and DataFrames, and the .hist() method will plot a histogram for you based on the series data.

    Python Pandas Series and DataFrame hist() Syntax and Parameters

    DataFrame.hist(column,by, grid, xlabelsize, xrot, ylabelsize, yrot, ax, sharex, sharey, figsize, layout, bins,backend, legend, **kwargs)

    Pandas Series and DataFrame hist() Method Parameters

    Parameter Description
    column (Optional Parameter) (default: None) The string value defines the limit data to a subset.
    by (Optional Parameter) (default: None) An object defines a histogram for separate groups.
    grid (Optional Parameter) (default: True) Defines the grids for the figure.
    xlabelsize (Optional Parameter) (default: None) The integer value defines the x-axis label size.
    xrot (Optional Parameter) (default: None) A float value rotates the x-axis label by the specified degree.
    ylabelsize (Optional Parameter) (default: None) The int value defines the y-axis label size.
    yrot (Optional Parameter) (default: None) Rotates the y-axis label by a specified degree.
    ax (Optional Parameter) (default: None) It defines the axes on which the histogram will be plotted.
    sharex (Optional Parameter) (default: True) Boolean value if subplots=True share x-axis and set some x-axis label to invisible.
    sharey (Optional Parameter) (default: True) Boolean value if subplots=True, share y-axis and set some y-axis label to invisible.
    figsize (Optional Parameter) (default: None) Tuple value and define the size of the figure.
    layout (Optional Parameter) (default: None) Tuple value and defines (rows, columns).
    bins (Optional Parameter) (default: 10) An integer or sequence value and defines the number of bins.
    backend (Optional Parameter) (default: None) It represents the backend specified options.
    legend (Optional Parameter) (default: False) Boolean values define whether to show the legend or not.

    Plot a Histogram Using Pandas

    import pandas as pd
    import matplotlib.pyplot as plt
    
    students_age = [12, 12, 13, 14, 13, 12, 15 ,15 ,11, 12, 13, 12, 14, 15, 16, 15, 12,13, 14, 16] 
    panda_series = pd.Series(students_age)
    
    # Create histogram
    hist  =  panda_series. hist( grid=True,range=(9,20), color="green")
    
    plt.title("Age Distribution of Class 9")
    plt.xlabel("Students Age")
    plt.ylabel("Frequency")
    
    #plot graph
    plt.show()

    Output

    3) Plotting a Histogram Using the Python Seaborn Library

    The Seaborn library is a production-ready Python data visualization library. It is built on the Python standard matplotlib library and supports a wide range of graphs. Data scientists prefer using Seaborn more than matplotlib because it can do more things with less code and less complexity.

    Create a Histogram With Seaborn

    Seaborn provides the hisplot() method, which can accept data in a sequence format, including DataFrame, numpy array, and Python list to plot a histogram.

    Python Seaborn histplot() Method Syntax

    seaborn.histplot(data)

    Plot a Histogram Using Seaborn

    import seaborn
    import pandas as pd
    import matplotlib.pyplot as plt
    
    students_age = [12, 12, 13, 14, 13, 12, 15 ,15 ,11, 12, 13, 12, 14, 15, 16, 15, 12,13, 14, 16] 
    panda_series = pd.Series(students_age)
    
    # Create histogram
    seaborn.histplot( panda_series)
    
    plt.title("Age Distribution of Class 9")
    plt.xlabel("Students Age")
    plt.ylabel("Frequency")
    
    #plot graph
    plt.show()

    Output

    Conclusion

    From all the above examples of Python histogram plotting, you can see that the histogram plotted using Seaborn is clearer and that with less code. Mostly all the Python data visualization libraries are built on top of the Python Matplotlib library. Thus, you need to use the plt.show() method to display the drawn histogram. In Python IDEs , you need to import matplotlib to your program if you want to show the created graph. If you use Jupyter Notebook, you can use the inline magic statement to display the graph.

    People are also reading:

    Leave a Comment on this Post

    0 Comments