I'm trying to scrape the viewers on www.twitch.tv/directory using Python. I have tried the basic BeautifulSoup script:
url= 'https://www.twitch.tv/directory'
html= urlopen(url)
soup = BeautifulSoup(url, "html5lib") #also tried using html.parser, lxml
soup.prettify()
This gives me html without the actual viewer numbers shown.
Then I tried using param ajax data. From this thread
param = {"action": "getcategory",
"br": "f21",
"category": "dress",
"pageno": "",
"pagesize": "",
"sort": "",
"fsize": "",
"fcolor": "",
"fprice": "",
"fattr": ""}
url = "https://www.twitch.tv/directory"
# Also tried with the headers parameter headers={"User-Agent":"Mozilla/5.0...
js = requests.get(url,params=param).json()
But I get a JSONDecodeError: Expecting value: line 1 column 1 (char 0)
error.
From then I moved on to selenium
driver = webdriver.Edge()
url = 'https://www.twitch.tv/directory'
driver.get(url)
#Also tried driver.execute_script("return document.documentElement.outerHTML") and innerHTML
html = driver.page_source
driver.close()
soup = BeautifulSoup(html, "lxml")
These just yield the same result I get from the standard BeautifulSoup call.
Any help on scraping the view count would be appreciated.