I'm looking for a good quality HTML Microdata parser in Python. It doesn't have to be blazing fast but I'd like it to support as much of the spec as possible including itemref
.
Here's what I've found so far:
- https://github.com/edsu/microdata
- https://github.com/RDFLib/pymicrodata
- https://pypi.python.org/pypi/pelican-microdata/0.1
Have you used any of these libraries? What were the pros and cons?
I'm also curious about parsing poorly formatted HTML documents. Have you found a Microdata parser that handles messy input or do you run the input through something like BeautifulSoup first?