0

I want to scrape html content from a couple of websites and view them on my website a kind of mashup. I will reference and link to them aswelll!

Thank you

MidnightCoder
  • 709
  • 2
  • 9
  • 18

3 Answers3

3

It is not considered "polite," but it is done often nonetheless. Some websites take countermeasures against such activity, but in general you'll be able to do it without any repercussions.

If you want to do it the right way, you'll simply inquire with the website operator. For all you know they'll be okay with it, or they may even have an API set you can use.

But if you do the scraping without permission, if your site gets popular and the original site(s) discover what you're doing, they could send you a cease and desist letter and/or take additional legal steps if they feel you're infringing on any copyrighted material.

http://en.wikipedia.org/wiki/Web_scraping

msigman
  • 4,474
  • 2
  • 20
  • 32
  • It's only impolite if you overload their servers or don't respect their robots.txt - the burden is on them to tell you not to scrape their content, not on you to ask. – pguardiario Apr 01 '12 at 02:58
  • 1
    I don't necessarily agree with that... to me it sounds like you're saying if someone doesn't lock their door, it's okay to walk into their house, because the burden is on them to lock the door -- and if they don't they're giving implicit approval to enter. As a webmaster of many sites myself I certainly wouldn't want anyone scraping my content for the purpose of re-posting on their own site. – msigman Apr 01 '12 at 03:03
  • 1
    Really? You wouldn't want google scraping your content for the purpose of listing you and showing excerpts of your content? If that's really the case you can disallow them in your robots.txt. Otherwise yes by having public website you are giving everyone implicit permission to visit your site. – pguardiario Apr 01 '12 at 04:21
  • 1
    You give permission to *visit* your site, but not to take and redistribute the content. Google doesn't *redistribute* the content, it makes it findable and provides a link to it. That's not the same as re-publishing it. – Andrew Leach Apr 02 '12 at 07:58
  • @andrew leach - As I said google redistributes the content in the form of an excerpt. I see no real distinction between an excerpt and a mash-up. – pguardiario Apr 05 '12 at 02:41
0

It's generally Not The Done Thing. If the authors of those websites want to make their data available they will probably have done so via an API or feed of some sort.

The best thing to do is to ask them directly. They might even offer you a better method than scraping.

Andrew Leach
  • 12,945
  • 1
  • 40
  • 47
0

Go ahead and do it but check their robots.txt and make sure there is a way for them to contact you if they have a problem with it. Most people will be happy to get traffic from your mash-up. Anyway the burden is on them to ask you not to.

pguardiario
  • 53,827
  • 19
  • 119
  • 159