I have a task to crawl all Pulitzer Winner, and I found this page has all I want: https://www.pulitzer.org/prize-winners-by-year/2018.
But I got the following problems,
Problem 1: How to crawl a dynamic page? I use python/urllib2.urlopen, to get the page's content, but this dynamic page doesn't return the real content from this.
Problem 2: I then found an API URL from devtool: https://www.pulitzer.org/cache/api/1/winners/year/166/raw.json. But when I sent a GET request from urllib2.urlopen, I always get null. How does it happen? Or how can I handle with it?
If this is too naive for you, please name some words so that I can learn it from Google.
Thanks in advance!