How to crawl news websites (content only)?

Question

I want to crawl Indian news websites and their archives (eg. thehindu.com, indianexpress.com and timesofindia.com).

I have heard of boilerplate library in Java used to extract content. But is there any library in python to do this and how t do this?

If this is a repeat question, please help me to point out.

The title of your question gives some pretty good pointers on [Google](https://www.google.com/search?q=How+to+crawl+news+websites+(content+only)+python) — lanzz, Feb 21 '14 at 16:42

score 6 · Accepted Answer · answered Feb 21 '14 at 16:43

6

Scrapy is a popular scraping framework for Python

answered Feb 21 '14 at 16:43

shaktimaan

1 Answers1