Python Data Persistence - XML Parsers

Neha Kumawat

10 months ago

Python Data Persistence - XML Parsers | Insideaiml
Python Data Persistence - XML Parsers | Insideaiml
XML stands for Extensible Markup Language. It is similar to HTML in its appearance but, XML is used for data presentation, while HTML is used to define what data is being used. XML is exclusively designed to send and receive data back and forth between clients and servers.
It is a well-known data interchange format, used by a large number of applications such as web services, office tools, and Service-Oriented Architectures (SOA). The XML format is both machine-readable and human-readable.
Standard Python library's xml package consists of following modules for XML processing −
Modules for XML Processing | Insideaiml
Modules for XML Processing | Insideaiml
Data in the XML document is arranged in a tree-like hierarchical format, starting with root and elements. Each element is a single node in the tree and has an attribute enclosed in <> and tags. One or more sub-elements may be assigned to each element.
Following is a typical example of an XML document −
While using the ElementTree module, the first step is to set up the root element of the tree. Each  Element has a tag and attrib which is a direct object. For the root element, attrib is an empty dictionary.
import xml.etree.ElementTree as xmlobj
Now, we can add one or more elements under the root element. Each element object may have SubElements. Each subelement has an attribute and text property.
   nm=xmlobj.SubElement(student, 'name')
   subject=xmlobj.SubElement(student, 'subject')
   marks=xmlobj.SubElement(student, 'marks')
This new element is appended to the root using append() method.
Append as many elements as desired using the above method. Finally, the root element object is written to a file.
tree = xmlobj.ElementTree(root)
   file = open('studentlist.xml','wb')
Now, we see how to parse the XML file. For that, construct document tree giving its name as file parameter in ElementTree constructor.
tree = xmlobj.ElementTree(file='studentlist.xml')
The tree object has getroot() method to obtain root element and getchildren() returns a list of elements below it.
root = tree.getroot()
children = root.getchildren()
A dictionary object corresponding to each sub-element is constructed by iterating over the sub-element collection of each child node.
for child in children:
   pairs = child.getchildren()
   for pair in pairs:
Each dictionary is then appended to a list returning original list of dictionary objects.
SAX is a standard interface for event-driven XML parsing. Parsing XML with SAX requires ContentHandler by subclassing XML.sax.ContentHandler. You register callbacks for events of interest and then, let the parser proceed through the document.
SAX is useful when your documents are large or you have memory limitations as it parses the file as it reads it from disk as a result entire file is never stored in the memory.

Document Object Model

(DOM) API is a World Wide Web Consortium recommendation. In this case, the entire file is read into the memory and stored in a hierarchical (tree-based) form to represent all the features of an XML document.
SAX, not as fast as DOM, with large files. On the other hand, DOM can kill resources, if used on many small files. SAX is read-only, while DOM allows changes to the XML file.
More in data science using python.

Submit Review