Skip to content Skip to sidebar Skip to footer

Python Lxml Write To File In Predefined Order

I want to write following lxml etree subelements: , , ,

Solution 1:

This sample demonstrates:

  • How to read in an XMl file,
  • That an Element is a list, and can be manipulated as such
  • How to sort a list based on the predefined order of a matchable substring
  • How to write out an XML file
from lxml import etree
import re

# Parse the XML and find the rootwithopen('input.xml') as input_file:
    tree = etree.parse(input_file)
root = tree.getroot()

# Find the list to sort and sort it
some_arbitrary_expression_to_find_the_list = '.'
element_list = tree.xpath(some_arbitrary_expression_to_find_the_list)[0]

predefined_order = [
    'Protocol',
    'StudyEventDef',
    'FormDef',
    'ItemGroupDef',
    'ItemGroupData',
    'ItemDef',
    'CodeList',
    'ClinicalData']
filter = re.compile(r'Element(.*)at0x.*')

element_list[:] = sorted(
    element_list[:],
    key = lambda x: predefined_order.index(filter.match(x.tag).group(1)))

# Write the XML to the output filewithopen('output.xml', 'w') as output_file:
    output_file.write(etree.tostring(tree, pretty_print = True))

Sample input:

<stuff><ElementProtocolat0x3803048 /><ElementStudyEventDefat0x3803108 /><ElementFormDefat0x3803248 /><ElementItemGroupDefat0x38032c8>Random Text</ElementItemGroupDefat0x38032c8><ElementClinicalDataat0x3803408 /><ElementItemGroupDataat0x38035c8><tag1><tag2attr="random tags"/></tag1></ElementItemGroupDataat0x38035c8><ElementFormDefat0x38036c8 /></stuff>

OUtput:

<stuff><ElementProtocolat0x3803048/><ElementStudyEventDefat0x3803108/><ElementFormDefat0x3803248/><ElementFormDefat0x38036c8/><ElementItemGroupDefat0x38032c8>Random Text</ElementItemGroupDefat0x38032c8><ElementItemGroupDataat0x38035c8><tag1><tag2attr="random tags"/></tag1></ElementItemGroupDataat0x38035c8><ElementClinicalDataat0x3803408/></stuff>

Solution 2:

Sorry for my lack of knowledge on xml but I tried to format your data in sorted order using my basic knowledge of Python only.

import re
data = """<ElementProtocolat0x3803048>,
<ElementStudyEventDefat0x3803108>,
<ElementFormDefat0x3803248>,
<ElementItemGroupDefat0x38032c8>,
<ElementClinicalDataat0x3803408>,
<ElementItemGroupDataat0x38035c8>,
<ElementFormDefat0x38036c8>,"""

predefined_order = ['Protocol','StudyEventDef','FormDef','ItemGroupDef','ItemGroupData','CodeList', 'ClinicalData']

fh1 = open("something.xml","w")
for i in predefined_order:
    for j in data.split(','):
        if re.search(i,j):
            fh1.write(j + ',')

Output:

<ElementProtocolat0x3803048>,
<ElementStudyEventDefat0x3803108>,
<ElementFormDefat0x3803248>,
<ElementFormDefat0x38036c8>,
<ElementItemGroupDefat0x38032c8>,
<ElementItemGroupDataat0x38035c8>,
<ElementClinicalDataat0x3803408>,

Post a Comment for "Python Lxml Write To File In Predefined Order"