Need To Read Xml Files As A Stream Using Beautifulsoup In Python
I have a dilemma. I need to read very large XML files from all kinds of sources, so the files are often invalid XML or malformed XML. I still must be able to read the files and ext
Solution 1:
Beautiful Soup has no streaming API that I know of. You have, however, alternatives.
The classic approach for parsing large XML streams is using an event-oriented parser, namely SAX. In python, xml.sax.xmlreader
. It will not choke with malformed XML. You can avoid erroneous portions of the file and extract information from the rest.
SAX, however, is low-level and a bit rough around the edges. In the context of python, it feels terrible.
The xml.etree.cElementTree
implementation, on the other hand, has a much nicer interface, is pretty fast, and can handle streaming through the iterparse()
method.
ElementTree
is superior, if you can find a way to manage the errors.
Post a Comment for "Need To Read Xml Files As A Stream Using Beautifulsoup In Python"