How do I pretty print an XML tree that contains whitespace?

Asked by scoder

Albert Brandl wrote:
If I append a subtree to an element,

  >>> elem1 = fromstring("""
   ... <a>
   ... <b/>
   ... <c/>
   ... </a>""")
  >>> elem2 = Element("e")
  >>> elem2.append(elem1)

pretty-printing does not what I'd expect:

  >>> print tostring(elem2, pretty_print = True)
  <e>
    <a>
    <b/>
    <c/>
  </a>
  </e>

Question information

Language:
English Edit question
Status:
Solved
For:
lxml Edit question
Assignee:
No assignee Edit question
Solved by:
scoder
Solved:
Last query:
Last reply:
Revision history for this message
scoder (scoder) said :
#1

The so-called "pretty printing" of XML essentially means adding white-space at places where it looks natural and where it is unlikely to scramble the content. Mind the word "unlikely". The notion of "ignorable whitespace" in XML is underdefined and a pure parser thing.

You can help the serialiser in figuring out what whitespace is "ignorable" by either a) letting the parser remove ignorable whitespace for you by giving it a DTD and the "remove_blank_text" option, or b) by removing it yourself, e.g. by deleting empty tail text and empty text before elements. After all, you know best what is ignorable and what isn't.

Example:

    def remove_ignorable_whitespace(root):
        for el in root.iter():
            if len(el) and el.text and not el.text.strip():
                el.text = None
            if el.tail and not el.tail.strip():
                el.tail = None