Skip to content Skip to sidebar Skip to footer

Need To Read Xml Files As A Stream Using Beautifulsoup In Python

I have a dilemma. I need to read very large XML files from all kinds of sources, so the files are often invalid XML or malformed XML. I still must be able to read the files and ext

Solution 1:

Beautiful Soup has no streaming API that I know of. You have, however, alternatives.

The classic approach for parsing large XML streams is using an event-oriented parser, namely SAX. In python, xml.sax.xmlreader. It will not choke with malformed XML. You can avoid erroneous portions of the file and extract information from the rest.

SAX, however, is low-level and a bit rough around the edges. In the context of python, it feels terrible.

The xml.etree.cElementTree implementation, on the other hand, has a much nicer interface, is pretty fast, and can handle streaming through the iterparse() method.

ElementTree is superior, if you can find a way to manage the errors.

Post a Comment for "Need To Read Xml Files As A Stream Using Beautifulsoup In Python"