Beautfiul Soup HTML parsing returning empty list when scraping YouTube

Question

I'm trying to use BS4 to parse through the HTML for an about page on a youtube channel so I can scrape the number of channel views. Below is the code to scrape the channel views (located in the 'yt-formatted-string') and also the whole right column of the page. Both lines of code return either an empty list and a "None" value for the findAll() and find() functions, respectively.

I read another thread saying I may be receiving an empty list or "None" value because the page is accessing an API to get the total channel views to count and the values aren't actually in the HTML I'm parsing.

I know I could access much of this info through the Youtube API, but I want to iterate this code over multiple channels that are not my own. Moreover, I want to understand how to use BS4 to its full extent so I can replicate this process on an Instagram page or Facebook page.

Should I be using a different library that isn't BS4? Is what I'm looking to accomplish even possible?

My CODE

from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq

#find Youtube channel views and subscriber counts

my_url = 'https://www.youtube.com/c/Rozziofficial/about'

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html,"html.parser")

body = page_soup.body
views_count = body.find_all('yt-formmated-string',{"class":"style-scope ytd-channel-about-metadata-renderer"})
right_column = body.find('div', {"id":"right-column"})

print(right_column)
print(views_count)

MendelG · Accepted Answer · 2021-06-15T20:43:19.660

YouTube is loaded dynamically, therefore urlib won't support it. However, the data is available in JSON format on the website. You can convert this data to a Python dictionary (dict) using the built-in json library.

This example is using the URL you have provided: https://www.youtube.com/c/Rozziofficial/about, you can change the channel name, it will work for all channels.

Here's an example using requests, you can use urlib instead:

import re
import json
import requests
from bs4 import BeautifulSoup

URL = "https://www.youtube.com/c/Rozziofficial/about"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")

# We locate the JSON data using a regular-expression pattern
data = re.search(r"var ytInitialData = ({.*});", str(soup)).group(1)

# Uncomment to view all the data
# print(json.dumps(data))

# This converts the JSON data to a python dictionary (dict)
json_data = json.loads(data)

# This is the info from the webpage on the right-side under "stats", it contains the data you want
stats = json_data["contents"]["twoColumnBrowseResultsRenderer"]["tabs"][5]["tabRenderer"]["content"]["sectionListRenderer"]["contents"][0]["itemSectionRenderer"]["contents"][0]["channelAboutFullMetadataRenderer"]

print("Channel Views:", stats["viewCountText"]["simpleText"])
print("Joined:", stats["joinedDateText"]["runs"][1]["text"])

Output:

Channel Views: 10,263,762 views
Joined: Jun 30, 2007

Beautfiul Soup HTML parsing returning empty list when scraping YouTube

1 Answers1