python web crawler cannot get full page

Question

I try to run the following python code:

import requests

headers={'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36'}
url="https://search.bilibili.com/all?keyword=Steins;Gate0"
try:
    r=requests.get(url=url,headers=headers)
    r.encoding='utf-8'
    if(r.status_code==200):
        print(r.text)
except:
    print("This is the selection of Steins Gate")

I am a beginner of web crawler. This is a crawler spider write by requests on python, but I cannot get the full page and I think this is the problem of Asynchronous page loading (Perhaps the website has other Strategy) So the question is how to get the full page.

score 0 · Answer 1 · answered Feb 10 '18 at 15:58

What you're dealing with is a well known problem that is somewhat straightforward but complex to execute because the content you want doesn't exist on the page without some sort of browser interaction.

Some recommendations:

Investigate headless browsers like headless chrome and their use cases
Investigate selenium, how to use it with Python and headless browsers

python web crawler cannot get full page

1 Answers1