0

I have some problems with load page content through request python module - here is the code :

url ="http://www.bestbuy.com/site/macbooks/macbook-air/pcmcat378600050008.c?id=pcmcat378600050008"

headers = {
        "User-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"}

response = requests.get(url, headers=headers
page_source = response.content

print page_source

print len(page_source)

This code execute in timer function with (10) interval - time.sleep(10)

But the problem is - sometimes it's load full page with length about 408000 chars, and sometimes 176000 chars, every time I have these numbers!!! How I can understand what is the problem or it's a bug ? Or maybe server changing web-page content ? Then I tried to increase interval with 20, but the problem still on! Also the same issue is happened with urllib2 module!

TheRutubeify
  • 646
  • 1
  • 7
  • 24
  • You are not checking your response status code. Maybe something is wrong with the request. Please add your response status code to the question. – bergerg Aug 03 '17 at 20:08
  • 200 every time! – TheRutubeify Aug 03 '17 at 20:12
  • I've just tried to run it, and got 403 (Access Denied), you might want to add some cookies that identify your session to the site. Another suggestion to solve this problem, Try to use 'reponse.text' or 'response.json' if you're expecting something specific from this request – YanivGK Aug 03 '17 at 20:26
  • 403 (Access Denied) you have because you need to use - headers, if you are using requests! For urllib2 it doesn't matter ! – TheRutubeify Aug 03 '17 at 20:31
  • The problem may be javascript, you may want to try selenium webdriver – whackamadoodle3000 Aug 03 '17 at 22:21

0 Answers0