Python XML Parser Tutorial: Read xml file example(Minidom, ElementTree)

Posted in /  

Python XML Parser Tutorial: Read xml file example(Minidom, ElementTree)
vinaykhatri

Vinay Khatri
Last updated on March 28, 2024

    This is a tutorial of Python XML Parser - the Standard XML module capable of parsing XML files and writing data to the same in Python.

    XML stands for Extensible Markup Language and like HTML, it is also a markup language. In XML, however, we do not use predefined tags but here we can use our own custom tags based on the data we are storing in the XML file.

    An XML file is often used to share, store, and structure data because it can easily be transferred between servers and systems.

    We all know when it comes to data, Python is one of the best programming languages to process and parse it. Luckily, Python comes with a Standard XML module that can parse XML files in Python and also write data in the XML file. This is called Python XML Parser.

    In this Python tutorial, we will walk through the Python XML minidom and ElemetnTree modules, and learn how to parse an XML file in Python.

    Python XML minidom and ElementTree module

    The Python XML module support two sub-modules minidom and ElementTree to parse an XML file in Python. The minidom or Minimal DOM module provides a DOM (Document Object Model) like structure to parse the XML file, which is similar to the DOM structure of JavaScript.

    Although we can parse an XML document using minidom , ElementTree provides a much better Pythonic way to parse an XML file in Python.

    XML File

    For all the examples in this tutorial, we will be using the demo.xml file, which contains the following XML data: # demo.xml

    <item>
        <record>
            <name>Jameson</name>
            <phone>(080) 78168241</phone>   
            <email>cursus.in.hendrerit@ipsumdolor.edu</email>
            <country>South Africa</country>
        </record>
    
        <record>
            <name>Colton</name>
            <phone>(026) 53458662</phone>
            <email>non@idmagna.ca</email>
            <country>Libya</country>
        </record>
    
        <record>
            <name>Dillon</name>
            <phone>(051) 96790901</phone>
            <email>Aliquam.ornare@Etiamlaoreetlibero.ca</email>
            <country>Madagascar</country>
        </record>
      
        <record>
            <name>Channing</name>
            <phone>(014) 98829753</phone>
            <email>faucibus.Morbi.vehicula@aliquamarcu.co.uk</email>
            <country>Korea, South</country>
        </record>
    </item>

    In the above example, you can see that the data is nested under custom <tags>. The root tag is <item>, which has <record> as a nested tag, which further has 4 more nested tags:

    1. <name>,
    2. <phone>,
    3. <email>, and
    4. <country>.

    Parse/Read XML Document in Python using minidom

    minidom is the submodule of the Python standard XML module , which means you do not have to pip install XML to use minidom . The minidom module parses the XML document in a Document Object Model(DOM), whose data can further be extracted using the getElemetsByTagName() function.

    Syntax: To parse the XML document in Python using minidom

    from xml.dom import minidom
    
    minidom.parse("filename")

    Example: Let's grab all the names and phone data from our demo.xml file.

    from xml.dom import minidom
    
    
    #parse xml file
    file = minidom.parse('demo.xml')
    
    #grab all <record> tags
    records = file.getElementsByTagName("record")
    
    print("Name------>Phone")
    
    for record in records:
        #access <name> and <phone> node of every record
        name = record.getElementsByTagName("name")
        phone = record.getElementsByTagName("phone")
        
        #access data of name and phone
        print(name[0].firstChild.data, end="----->")
        print(phone[0].firstChild.data)

    Output

    Name------>Phone
    Jameson----->(080) 78168241
    Colton----->(026) 53458662
    Dillon----->(051) 96790901
    Channing----->(014) 98829753

    In the above example, you can see that first, we imported the minidom module using the from xml.dom import minidom statement. Then we parse our demo.xml file with file = minidom.parse('demo.xml') statement. The parse() function parses the XML document in a model node object with the <item> root node.

    Note: " Our Python script and the demo.xml file are located at the same location that's why we only specify the file name demo.txt in the minidom.parse() function. If your Python script and xml file are located at different locations, then you have to specify the absolute or relative path of the file."

    After passing the XML file in our Python program we accessed all the <record> nodes using the records = file.getElementsByTagName("record") statement. The getElementsByTagName() is the minidom object function which returns a node objects of the specified tag.

    Once we had all the record nodes, we loop through those nodes, and again using the getElementsByTagName() function we accessed its nested <name> and <phone> nodes.

    Next, after accessing the individual name and phone node we printed their data using name[0].firstChild.data and phone[0].firstChild.data statement. The firstChild.data is the property of every node, by which we can access the text data of a specific node object.

    Parse/Read XML Document in Python Using ElementTree

    The ElementTree module provides a simple and straightforward way to parse and read XML files in Python. As minidom is the submodule of xml.dom, the ElementTree is the submodule of xml.etree . The ElementTree module parses the XML file in a tree-like structure where the root branch will be the first <tag> of the xml file(<item> in our case).

    Syntax: To parse the XML document in Python using ElementTree

    import xml.etree.ElementTree as ET 
    
     ET.parse('file_name.xml')

    Example

    Using minidom we grab the name and phone data, now let's access email and country data using XML ElementTree.

    import xml.etree.ElementTree as ET
    
    tree = ET.parse('demo.xml')
    
    #get root branch <item>
    item = tree.getroot()
    
    
    #loop through all <record> of <item>
    for record in item.findall("record"):
        email = record.find("email").text
        country = record.find("country").text
        print(f"Email: {email},-------->Country:{country}")

    Output

    Email: cursus.in.hendrerit@ipsumdolor.edu,-------->Country:South Africa
    Email: non@idmagna.ca,-------->Country:Libya
    Email: Aliquam.ornare@Etiamlaoreetlibero.ca,-------->Country:Madagascar
    Email: faucibus.Morbi.vehicula@aliquamarcu.co.uk,-------->Country:Korea, South

    From the above example, you can see that using ElementTree provides a more elegant and pythonic way to read or Parse an XML file in Python.

    In our first statement, we imported import xml.etree.ElementTree as ET ElementTree as ET in our program. Then using the tree= ET.parse('demo.xml') statement we parse demo.xml file.

    With the help of  the item = tree.getroot() statement we access the root branch of our xml file, which is <item>. Then we loop through every <record> branch with the item.findall("record") statement and grab their email and phone data with record.find("email").text and record.find("phone").text statements.

    Check out the Official documentation of the XML ElementTree module to know more about ElementTree and its functions.

    Conclusion

    That sums up this tutorial on Python XML Parser. As you can see, Python provides an inbuild Standard xml module to read and parse XML files in Python. It generally has 2 submodules that can parse an XML file:

    1. minidom and
    2. ElementTree.

    The minidom module follows the Document Object Model approach to parse an XML file. On the other hand, the ElementTree module follows the tree-like structure to parse the XML file.

    People are also reading:

    Leave a Comment on this Post

    0 Comments