All Courses

Data Persistence- XML Parsers in Python

Neha Kumawat

4 years ago

XML is a popular cross-platform language which is open source and portable to use very much like HTML or SGML. XML stands for eXtensible Markup Language and it is widely used in the World Wide Web Consortium (www).
XML is well-known for a data interchange format and it is used in a large number of applications such as web services, office tools, and Service-Oriented Architectures (SOA). It is used by different applications as XML format is very easy to read by both humans and machines.
Some of the XML packages consists of Python for XML processing are mentioned below-
XML Packages
In XML documents the data are arranged in a tree-like hierarchical format. It starts with the root and elements. Here each element represents a single node in the tree and has an attribute enclosed in <> and </> tags. We may also assign one or more sub-elements to each element.
Let’s see an example of the XML document-
<?xml version = "1.0"
encoding = "iso-8859-1"?>
<studentlist>
<student>
<name>Amit</name>
<subject>Physics</subject>
<marks>80</marks>
</student>
<student>
<name>Karan</name>
<subject>Maths</subject>
<marks>100</marks>
</student>
<student>
<name>Sahil</name>
<subject>Biology</subject>
<marks>98</marks>
</student>
</studentlist>
Now let’s assume we want to use the ElementTree module. So, we have to follow some steps.
The first step is to set up the root element of the tree. Here each element consists of a tag and attrib which is a dict object. For the root element, attrib is like an empty dictionary.
import xml.etree.ElementTree as xml
root=xml.Element('studentList')
Let’s see how we can add one or more elements under the root element. Here each element object may have, SubElements and each subelement has an attribute and text property.
student=xmlobj.Element('student')
a = xmlobj.SubElement(student, 'name')
a.text='name'
subject = xmlobj.SubElement(student, 'subject')
a.text='Amit'
subject.text='Physics'
marks = xmlobj.SubElement(student,'marks')
marks.text='80'
We can append a new element to the root using append() method as shown below.
root.append(student)
We can also append as many elements as we wish to by using the above method. Then the root element object is written to a file as shown below.
tree = xmlobj.ElementTree(root)
file = open('studentlist.xml','wb')
tree.write(file)
    file.close()
Lets now see how to parse the XML file.
We have to construct a document tree giving its name as the file parameter in the ElemenTree constructor as shown below.
ele_tree = xmlobj.ElementTree(file='studentlist.xml')
To obtain a root element of the tree object it has a getroot() method and also getchildren() which returns a list of elements as shown below.
root = ele_tree.getroot()
children = root.getchildren()
We can find a dictionary object corresponding to each subelement which is constructed by iterating over sub-element collection of each child node as shown below.
for child in children:
student={}
pairs = child.getchildren()
for pair in pairs:
product[pair.tag]=pair.text
Now we can append each dictionary to a list which returns the original list of dictionary objects.
For more blogs/courses on data science, machine learning, artificial intelligence, and new technologies do visit us at InsideAIML.
Thanks for reading…      

Submit Review