Skip to content Skip to sidebar Skip to footer

Editing Local XML File Using Python And Regular Expression

I am new to python and trying to modify some xml configuration files which are present in my local system. Input: I have an xml file(say Test.xml) with the following content.

Solution 1:

This is not exactly what you wanted but it's close. For one thing, avoid regex for xml, html and similar processing like the plague. At the same time, don't be surprised if you find occasional 'challenges' in using products like lxml.

I think, this time, I found a bug.

from lxml import etree
tree = etree.parse('shivam.xml')
element_to_change = tree.xpath('.//Composer/SocketTimeout')[0]
print(element_to_change)
element_to_change.text='60'
comment_will_follow_this = tree.xpath('.//Composer')[0]
print(comment_will_follow_this)
comment = etree.Comment('This did not work')
comment_will_follow_this.append(comment)

comment = etree.Comment('Changed this value to reduce SocketTimeout')
element_to_change.addprevious(comment)

tree.write('see_it.xml', pretty_print=True)
  • I used xpath to find the element to change, and the places in the file to receive the comments.
  • The append method is supposed to add a comment or other element to a given element as a child. However, I found in this case that the 'This did not work' comment was added as a preceding element comment.
  • However, I did find that addprevious was able to add the comment in the desired location, the fly in the ointment being that it fails to place an end-line between the comment and the next xml element.

Here's the resulting file. Indicidentally, you will note that the original comments are intact.

<JavaHost>
    <Domain>
       <MessageProcessor>
          <!-- This comment should not be removed and all formating should be untouched -->
          <SocketTimeout>500</SocketTimeout>
       </MessageProcessor>
        <!-- This comment should not be removed and all formating should be untouched -->
       <Composer>
            <!--Changed this value to reduce SocketTimeout--><SocketTimeout>60</SocketTimeout>
            <Enabled>true</Enabled>
       <!--This did not work--></Composer> 
   </Domain>
</JavaHost>

Solution 2:

Since you used modify and XML in same sentence, consider XSLT, the special-purpose language designed to modify XML files. Python's lxml can run XSLT 1.0 scripts as well as external processors or other languages that Python can call at command line. So, XSLT is portable! Even more, Python can pass parameters to XSLT in case 50 needs to be dynamically adjusted -very similar to parameters in the other special-purpose language, SQL, of which Python has many APIs.

Specifically, XSLT maintains the <xsl:comment> command and can append or rewrite nodes to trees. Also, as commented, linked, and hopefully web search recommended, it is highly ill-adivsed to use regex on X|HTML documents being non-natural languages. Hence, DOM libraries like Python's etree, lxml, minidom are preferred, of course XSLT too that adheres to W3C standards.

XSLT (save as .xsl file, a special .xml file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="node()|@*|comment()">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*|comment()"/>
     </xsl:copy>
    </xsl:template>

    <xsl:template match="Domain">
     <xsl:copy>       
       <xsl:apply-templates select="*|@*|comment()"/>
       <New_tag>
         <xsl:comment>New Tag</xsl:comment>
         <Enabled>true</Enabled>
       </New_tag>
     </xsl:copy>
    </xsl:template>

    <xsl:template match="Composer">
     <xsl:copy>
       <xsl:comment>Changed this value to reduce SocketTimeout</xsl:comment>
       <SocketTimeout>50</SocketTimeout>
       <xsl:apply-templates select="Enabled"/>
     </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

Python

import lxml.etree as et

# LOAD XML AND XSLT
dom = et.parse('Input.xml') 
xslt = et.parse('XSLT_Script.xsl')

# TRANSFORM SOURCE
transform = et.XSLT(xslt)
newdom = transform(dom)

# OUTPUT TO CONSOLE
print(newdom)

# OUTPUT TO FILE
with open('Output.xml', 'wb') as f:
    f.write(newdom)

Output

<JavaHost>
  <Domain>
    <MessageProcessor>
      <!-- This comment should not be removed and all formating should be untouched -->
      <SocketTimeout>500</SocketTimeout>
    </MessageProcessor>
    <!-- This comment should not be removed and all formating should be untouched -->
    <Composer>
      <!--Changed this value to reduce SocketTimeout-->
      <SocketTimeout>50</SocketTimeout>
      <Enabled>true</Enabled>
    </Composer>
    <New_tag>
      <!--New Tag-->
      <Enabled>true</Enabled>
    </New_tag>
  </Domain>
</JavaHost>

Post a Comment for "Editing Local XML File Using Python And Regular Expression"