2

I'm trying to use request-promise to scrape a the price of an item from Asos.com. When I attempt to run the code below, I get a 403 error. Is this possible that I get this error although the URL that I am attempting to scrape is publicly available, no key required?

http://www.asos.com/api/product/catalogue/v2/stockprice?productIds=10000496&currency=SEK&keyStoreDataversion=7jhdf34h-6&store=ROE

I know some site are against scraping in their ToS but I just want to be sure I am not just performing this wrong or if I am actually getting blocked by the site.

const rp = require('request-promise');

var url = 'http://www.asos.com/api/product/catalogue/v2/stockprice?productIds=10000496&currency=SEK&keyStoreDataversion=7jhdf34h-6&store=ROE';

rp({ url:url, json:true })
  .then(function (data) {
    console.log(data.productPrice.current.value);
  })
  .catch(function (reason) {
    console.error("%s; %s", reason.error.message, reason.options.url);
    console.log("%j", reason.response.statusCode);
  });
kmpace
  • 33
  • 1
  • 5
  • In short, yes, you are getting blocked. 403 is a common status code issued by APIs if the IP/user rate limit is exceeded. Basically, if you are not being nice to them and you are hammering them with requests, they will send you back a 403. I suspect this is probably what's happening. – Adam Jenkins Jun 04 '18 at 17:43
  • Thats what I thought, I am just sending one request at a time but I guess they have a complete blocker of any requests if that makes sense. – kmpace Jun 04 '18 at 17:49
  • I can hit that URL in my browser and get a 200 no problem. But you may have tripped their limits if you have sent X requests in the last 60 seconds, 5 minutes, 24 hours, etc, despite only ever sending one request at a time. – Adam Jenkins Jun 04 '18 at 18:06
  • Found a fix to this for anyone in the future, adding a header like below will mimic a browser request - ` headers: { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36' }` – kmpace Jun 05 '18 at 16:43

1 Answers1

4

You should add "headers" parameter, i.e.:

rp({ 
  url:url, 
  headers: {
    'User-Agent': 'Request-Promise'
  },
  json:true 
})
Nighto
  • 3,994
  • 3
  • 23
  • 28
Naycho334
  • 167
  • 2
  • 11