Python XML Parser Tutorial: Read xml file example(Minidom, ElementTree)

By | November 17, 2021
Python XML Parser Tutorial

This is a tutorial of Python XML Parser – the Standard XML module capable of parsing XML files and writing data to the same in Python.

XML stands for Extensible Markup Language and like HTML, it is also a markup language. In XML, however, we do not use predefined tags but here we can use our own custom tags based on the data we are storing in the XML file.

An XML file is often used to share, store, and structure data because it can easily be transferred between servers and systems. We all know when it comes to data, Python is one of the best programming languages to process and parse it.

Luckily, Python comes with a Standard XML module that can parse XML files in Python and also write data in the XML file. This is called Python XML Parser.

In this Python tutorial, we will walk through the Python XML minidom and ElemetnTree modules, and learn how to parse an XML file in Python.

Vamware

Python XML minidom and ElementTree module

The Python XML module support two sub-modules minidom and ElementTreeto parse an XML file in Python.

The minidom or Minimal DOM module provides a DOM (Document Object Model) like structure to parse the XML file, which is similar to the DOM structure of JavaScript.

Vamware

Although we can parse an XML document using minidom, ElementTree provides a much better Pythonic way to parse an XML file in Python.

XML File

For all the examples in this tutorial, we will be using the demo.xmlfile, which contains the following XML data:

#demo.xml

<item>
    <record>
        <name>Jameson</name>
        <phone>(080) 78168241</phone>   
        <email>cursus.in.hendrerit@ipsumdolor.edu</email>
        <country>South Africa</country>
    </record>

    <record>
        <name>Colton</name>
        <phone>(026) 53458662</phone>
        <email>non@idmagna.ca</email>
        <country>Libya</country>
    </record>

    <record>
        <name>Dillon</name>
        <phone>(051) 96790901</phone>
        <email>Aliquam.ornare@Etiamlaoreetlibero.ca</email>
        <country>Madagascar</country>
    </record>
  
    <record>
        <name>Channing</name>
        <phone>(014) 98829753</phone>
        <email>faucibus.Morbi.vehicula@aliquamarcu.co.uk</email>
        <country>Korea, South</country>
    </record>
</item>

In the above example, you can see that the data is nested under custom <tags>. The root tag is <item>, which has <record> as a nested tag, which further has 4 more nested tags:

  1. <name>,
  2. <phone>,
  3. <email>, and
  4. <country>.

Parse/Read XML Document in Python using minidom

minidom is the submodule of the Python standard XML module, which means you do not have to pip install XML to use minidom.

The minidom module parses the XML document in a Document Object Model(DOM), whose data can further be extracted using the getElemetsByTagName()function.

Syntax: To parse the XML document in Python using minidom 

from xml.dom import minidom

minidom.parse("filename")

Example:

Let’s grab all the names and phone data from our demo.xml file.

from xml.dom import minidom


#parse xml file
file = minidom.parse('demo.xml')

#grab all <record> tags
records = file.getElementsByTagName("record")

print("Name------>Phone")

for record in records:
    #access <name> and <phone> node of every record
    name = record.getElementsByTagName("name")
    phone = record.getElementsByTagName("phone")
    
    #access data of name and phone
    print(name[0].firstChild.data, end="----->")
    print(phone[0].firstChild.data)

Output

Name------>Phone
Jameson----->(080) 78168241
Colton----->(026) 53458662
Dillon----->(051) 96790901
Channing----->(014) 98829753

In the above example, you can see that first, we imported the minidom module using the from xml.dom import minidom statement.

Then we parse our demo.xml file with file = minidom.parse('demo.xml')statement. The parse() function parses the XML document in a model node object with the <item> root node.

Note:Our Python script and the demo.xml file are located at the same location that’s why we only specify the file name demo.txtin the minidom.parse()function. If your Python script and xml file are located at different locations, then you have to specify the absolute or relative path of the file.”

After passing the XML file in our Python program we accessed all the <record> nodes using the records = file.getElementsByTagName("record")statement.

The getElementsByTagName()is the minidom object function which returns a node objects of the specified tag.

Once we had all the record nodes, we loop through those nodes, and again using the getElementsByTagName() function we accessed its nested <name> and <phone> nodes.

Next, after accessing the individual name and phone node we printed their data using  name[0].firstChild.dataand phone[0].firstChild.datastatement.

The firstChild.datais the property of every node, by which we can access the text data of a specific node object.

Parse/Read XML Document in Python Using ElementTree

The ElementTree module provides a simple and straightforward way to parse and read XML files in Python. As minidom is the submodule of xml.dom, the ElementTree is the submodule of xml.etree.

The ElementTree module parses the XML file in a tree-like structure where the root branch will be the first <tag> of the xml file(<item> in our case).

Syntax: To parse the XML document in Python using ElementTree

import xml.etree.ElementTree as ET 

 ET.parse('file_name.xml')

Example

Using minidom we grab the name and phone data, now let’s access email and country data using XML ElementTree.

import xml.etree.ElementTree as ET

tree = ET.parse('demo.xml')

#get root branch <item>
item = tree.getroot()


#loop through all <record> of <item>
for record in item.findall("record"):
    email = record.find("email").text
    country = record.find("country").text
    print(f"Email: {email},-------->Country:{country}")

Output

Email: cursus.in.hendrerit@ipsumdolor.edu,-------->Country:South Africa
Email: non@idmagna.ca,-------->Country:Libya
Email: Aliquam.ornare@Etiamlaoreetlibero.ca,-------->Country:Madagascar
Email: faucibus.Morbi.vehicula@aliquamarcu.co.uk,-------->Country:Korea, South

From the above example, you can see that using ElementTree provides a more elegant and pythonic way to read or Parse an XML file in Python.

In our first statement, we imported import xml.etree.ElementTree as ET ElementTree as ET in our program.

Then using the tree= ET.parse('demo.xml')statement we parse demo.xml file.

With the help of  the item = tree.getroot()statement we access the root branch of our xml file, which is <item>.

Then we loop through every <record> branch with theitem.findall("record") statement and grab their email and phone data with record.find("email").text and record.find("phone").text statements.

Check out the Official documentation of the XML ElementTree module to know more about ElementTree and its functions.

Conclusion

That sums up this tutorial on Python XML Parser. As you can see, Python provides an inbuild Standard xml module to read and parse XML files in Python. It generally has 2 submodules that can parse an XML file:

  1. minidom and
  2. ElementTree.

The minidom module follows the Document Object Model approach to parse an XML file. On the other hand, the ElementTree module follows the tree-like structure to parse the XML file.

People are also reading:

Author: Vinay

I am a Full Stack Developer with a Bachelor's Degree in Computer Science, who also loves to write technical articles that can help fellow developers.

Leave a Reply

Your email address will not be published. Required fields are marked *