5

The title is self explanatory and before tagging this as duplicate please consider that I have checked this answer and it does not work for me because I don't even get the correct format in sys.stdout not only when writing to file. So I have the following xml (test.xml):

<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://www...">
  <soap:Body>
    <SubmitTransaction xmlns="http://www.">
      <Authentication>
      </Authentication>
      <Transaction>
        <DataFields>
        </DataFields>
      </Transaction>
    </SubmitTransaction>
  </soap:Body>
</soap:Envelope>

And the following code:

from lxml import etree

parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse("test.xml", parser)

def get_data_fields():
    for node in tree.iter():
        if 'DataFields' in node.tag:
            return node
a = get_data_fields()
field = etree.Element('Field_1')
child_1 = etree.Element('FieldName')
child_2 = etree.Element('FieldValue')
child_3 = etree.Element('FieldIndex')
child_1.text = 'dateTime'
child_2.text = '2016-07-29T12:00:00'
child_3.text = '1'

for i in [child_1, child_2, child_3]:
    field.append(i)
a.append(field)

s = etree.tostring(tree, pretty_print=True)
print(s.decode('utf-8'))

OUTPUT

<soap:Envelope xmlns:soap="http://www...">
  <soap:Body>
    <SubmitTransaction xmlns="http://www.">
      <Authentication>
      </Authentication>
      <Transaction>
        <DataFields>
        <Field_1><FieldName>dateTime</FieldName><FieldValue>2016-07-29T12:00:00</FieldValue><FieldIndex>1</FieldIndex></Field_1></DataFields>
      </Transaction>
    </SubmitTransaction>
  </soap:Body>
</soap:Envelope>

EXPECTED

<soap:Envelope xmlns:soap="http://www...">
  <soap:Body>
    <SubmitTransaction xmlns="http://www.">
      <Authentication>
      </Authentication>
      <Transaction>
        <DataFields>
          <Field_1>
            <FieldName>dateTime</FieldName>
            <FieldValue>2016-07-29T12:00:00</FieldValue>
            <FieldIndex>1</FieldIndex>
          </Field_1>
        </DataFields>
      </Transaction>
    </SubmitTransaction>
  </soap:Body>
</soap:Envelope>

I really do not understand why new field I am adding is not formatted as supposed to, because if I print only field, everything looks fine:

s = etree.tostring(root, pretty_print=True)
print(s.decode('utf-8'))

#<Field_1 xmlns="http://www." xmlns:soap="http://www...">
#  <FieldName>dateTime</FieldName>
#  <FieldValue>2016-07-29T12:00:00</FieldValue>
#  <FieldIndex>1</FieldIndex>
#</Field_1>

NOTE: I am using python 3.4 (this is the reason why I have to .decode('utf-8') otherwise I just get byte literals).

skamsie
  • 2,614
  • 5
  • 36
  • 48

1 Answers1

3

It works if you add this line after a = get_data_fields():

a.text = None

lxml cannot always determine what whitespace is ignorable, so in some cases the whitespace needs to be removed manually.

See http://lxml.de/FAQ.html#why-doesn-t-the-pretty-print-option-reformat-my-xml-output:

If you want to be sure all blank text is removed from an XML document (or just more blank text than the parser does by itself), you have to use either a DTD to tell the parser which whitespace it can safely ignore, or remove the ignorable whitespace manually after parsing, e.g. by setting all tail text to None:

mzjn
  • 48,958
  • 13
  • 128
  • 248