Resolving entities

Asked by Kovid Goyal

Hi,

My application needs to process XML files that do not have DTD declarations but that contain entities. Can I inform XMLParser of the entities somehow? Setting resolve_entities to False doesn't work (still raises an undeclared entity error). Setting recover=True causes the entities to be removed from the tree:

etree.tostring(etree.fromstring('<a>1&my;2</a>', etree.XMLParser(recover=True)))

gives

'<a>12</a>'

etree.LXML_VERSION
(2, 0, 5, 0)

etree.LIBXML_VERSION
(2, 6, 32)

Question information

Language:
English Edit question
Status:
Answered
For:
lxml Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
scoder (scoder) said :
#1

Kovid Goyal wrote:
> My application needs to process XML files that do not have DTD
> declarations but that contain entities.

In this case your document is not well-formed, i.e. not XML.

http://www.w3.org/TR/REC-xml/#sec-references

> Can I inform XMLParser of the entities somehow?

No, there isn't currently a way to work around such a broken document.
libxml2 follows the XML spec strictly in that it rejects references to
undeclared entities in the absence of a DTD.

ElementTree lacks DTD support and instead allows you to specify entities
through a parser local "entity" dictionary. lxml could potentially support
a similar interface by intercepting the entity reference resolving at the
SAX layer ("getEntity()" callback function), but that's not implemented.
Please file a wishlist bug.

Stefan

Can you help with this problem?

Provide an answer of your own, or ask Kovid Goyal for more information if necessary.

To post a message you must log in.