I was wondering how Google Reader extracts news items from a web page.
Does any of you know how it works? Or how someone can build a similar system to extract the same information from the HTML of a web page.
Obviously it is not using a standard (nor is it only reading RSS/ATOM), because Google Reader proves that it can read the content of the page regardless of how the markup looks like.