0

In the interests of educational web-scraping I'm trying to parse some html but something I can't explain is happening. When I view the page's source code via developer tools in Chrome I see

<link rel="preload" href="/static/bundles/es6/Consumer.js/54df4d9114a3.js" as="script" type="text/javascript" crossorigin="anonymous" />

but when I load it in Python via requests.get(url, headers) I get

<link rel="preload" href="/static/bundles/metro/Consumer.js/54df4d9114a3.js" as="script" type="text/javascript" crossorigin="anonymous" />

The difference is es6 is metro. What may be causing this? What could cause the same url to return different static html?

I'm using an identical User-Agent string to what's shown in Dev Tools, so I suspect I could be missing some other header information.

headers = {
    'Access-Control-Allow-Origin': '*',
    'Access-Control-Allow-Methods': 'GET',
    'Access-Control-Allow-Headers': 'Content-Type',
    'Access-Control-Max-Age': '3600',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36'
    }
html_content = requests.get(url, headers).text

enter image description here

I'm aware of this question, with an almost identical title, Requests.get showing different HTML than Chrome's Developer Tool, but it doesn't answer the question and I don't want to use Selenium or a Web Driver. I'm after speed.

livin_amuk
  • 1,285
  • 12
  • 26

0 Answers0