Skip to content Skip to sidebar Skip to footer

Extracting Raw XML Via Lxml Etree

I'm trying to extract raw XML from an XML file. So if my data is: ... Lots of XML ...

Solution 1:

You should be able to use tostring() to serialize the XML.

Example...

from lxml import etree

xml = """
<xml>
    <getThese>
        <clonedKey>1</clonedKey>
        <clonedKey>2</clonedKey>
        <clonedKey>3</clonedKey>
        <randomStuff>this is a sentence</randomStuff>
    </getThese>         
    <getThese>
        <clonedKey>6</clonedKey>
        <clonedKey>8</clonedKey>
        <clonedKey>3</clonedKey>
        <randomStuff>more words</randomStuff>
    </getThese>
</xml>
"""

parser = etree.XMLParser(remove_blank_text=True)

tree = etree.fromstring(xml, parser=parser)

elems = []

for elem in tree.xpath("getThese"):
    elems.append(etree.tostring(elem).decode())

print(elems)

Printed output...

['<getThese><clonedKey>1</clonedKey><clonedKey>2</clonedKey><clonedKey>3</clonedKey><randomStuff>this is a sentence</randomStuff></getThese>', '<getThese><clonedKey>6</clonedKey><clonedKey>8</clonedKey><clonedKey>3</clonedKey><randomStuff>more words</randomStuff></getThese>']

Post a Comment for "Extracting Raw XML Via Lxml Etree"