-1

I want to crawl Indian news websites and their archives (eg. thehindu.com, indianexpress.com and timesofindia.com).

I have heard of boilerplate library in Java used to extract content. But is there any library in python to do this and how t do this?

If this is a repeat question, please help me to point out.

mridul
  • 105
  • 2
  • 6
  • 2
    The title of your question gives some pretty good pointers on [Google](https://www.google.com/search?q=How+to+crawl+news+websites+(content+only)+python) – lanzz Feb 21 '14 at 16:42

1 Answers1

6

Scrapy is a popular scraping framework for Python

shaktimaan
  • 11,962
  • 2
  • 29
  • 33