Parse Large Python Xml Using Xmltree
Solution 1:
Iterparse is not that difficult to use in this case.
temp.xml
is the file presented in your question with a </MyRoot>
stuck on as a line at the end.
Think of the source =
as boilerplace, if you will, that parses the xml file and returns chunks of it element-by-element, indicating whether the chunk is the 'start' of an element or the 'end' and supplying information about the element.
In this case we need consider only the 'start' events. We watch for the 'PersonName' tags and pick up their texts. Having found the one and only such item in the xml file we abandon the processing.
>>>from xml.etree import ElementTree>>>source = iter(ElementTree.iterparse('temp.xml', events=('start', 'end')))>>>for an_event, an_element in source:...if an_event=='start'and an_element.tag.endswith('PersonName'):... an_element.text...break...
'Miracle Smith'
Edit, in response to question in a comment:
Normally you wouldn't do this since iterparse
is intended for use with large chunks of xml. However, by wrapping a string in a StringIO
object it can be processed with iterparse
.
>>>from xml.etree import ElementTree>>>from io import StringIO>>>xml = StringIO('''\...<?xml version="1.0" encoding="utf-8"?>...<MyRoot xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" uuid="ertr" xmlns="http://www.example.org/yml/data/litsmlv2">... <Aliases authority="OPP" xmlns="http://www.example.org/yml/data/commonv2">... <Description>myData</Description>... <Identifier>43hhjh87n4nm</Identifier>... </Aliases>... <RollNo uom="kPa">39979172.201167159</RollNo>... <PersonName>Miracle Smith</PersonName>... <Date>2017-06-02T01:10:32-05:00</Date>...</MyRoot>''')>>>source = iter(ElementTree.iterparse(xml, events=('start', 'end')))>>>for an_event, an_element in source:...if an_event=='start'and an_element.tag.endswith('PersonName'):... an_element.text...break...
'Miracle Smith'
Post a Comment for "Parse Large Python Xml Using Xmltree"