I would like to replicate the functionality that Facebook uses to parse a link. When you submit a link into your Facebook status, their system goes out and retrieves a suggested title
, summary
and often one or more relevant image
s from that page, from which you can choose a thumbnail.
My application needs to accomplish this using Python, but I am open to any kind of a guide, blog post or experience of other developers which relates to this and might help me figure out how to accomplish it.
I would really like to learn from other people's experience before just jumping in.
To be clear, when given the URL of a web page, I want to be able to retrieve:
- The title: Probably just the
<title>
tag but possibly the<h1>
, not sure. - A one-paragraph summary of the page.
- A bunch of relevant images that could be used as a thumbnail. (The tricky part is to filter out irrelevant images like banners or rounded corners)
I may have to implement it myself, but I would at least want to know about how other people have been doing these kinds of tasks.