0

I am gonna to scrape https://movie.douban.com/subject/1292052/ this page

but the url redirect to http://m.douban.com/movie/subject/1292052 how did I back to the first page and use the first page's parse way(xpath) to go on? thanks!

ileadall42
  • 631
  • 2
  • 7
  • 19

1 Answers1

0

The reason you are being redirected to the mobile site is because your user agent is not a known desktop browser.

You can modify USER_AGENT variable in settings.py to something like USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36'

If you insist on not getting redirected, you can add meta variable in your request with {'dont_redirect': True, 'handle_httpstatus_list': [302]}. This way, you won't get redirected.

Mikko
  • 602
  • 4
  • 18
  • Thanks a lot! but if I use the second way then I can't event get the page(),view resopnse is a text which inform me to request the redirecting url.And the first way to use a User-Agent,yes, I did use a User-Agent'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Mobile Safari/537.36' Is it a mobile user-agent? I don't know how to change it ,but my way to get the user-agent is "responsive" in chrome . I have been try to "iphone6" – ileadall42 May 24 '17 at 08:17
  • so how to change the responsive way in chrome to get a nonmobile user-agent? – ileadall42 May 24 '17 at 08:26
  • @TomJhonson-FFT You can get common user-agent strings from here: https://techblog.willshouse.com/2012/01/03/most-common-user-agents/ – Mikko May 26 '17 at 02:27