Python - BeautifulSoup4 "None" Return?

Question

I wanted to get .text of , but it simply doesn't work.

Here is my code:

from bs4 import BeautifulSoup
import request

generatedLink = "MyLink"
page = requests.get(generatedLink)
contents = page.text
soup = BeautifulSoup(contents, "html.parser")
name = soup.find('a',class_=["yt-simple-endpoint", "style-scope", "ytd-video-renderer"])

print(name)

And it returns "None"

<a id="video-title" class="yt-simple-endpoint style-scope ytd-video-renderer" aria-label="TURNIR 1 VS 1 U LOLU FINALEE!! od korisnika KaLuu Vrijeme streaminga: prije 3 dana 3 sata i 49 minuta 644 pregleda" href="/watch?v=5N4X4hjkzOw" title="TURNIR 1 VS 1 U LOLU FINALEE!!">

TURNIR 1 VS 1 U LOLU FINALEE!!

</a>

I need to extract text from that title up here!

Something is wrong here, but I can't find that error in code. Can someone help me ?

If you print (soup) do you see your HTML you are trying to select or is it added dynamically with JavaScript using AJAX? If you right click in Firefox -> Inspect Element -> Network select XHR and reload the page you are trying to scrape does it show any XHRs? Could you try the same for WS instead of XHR too? — Dan-Dev, Sep 23 '18 at 21:17
When i print(soup), I can see my "part of code" that I wanna "scrape" out of everything. — YoungBoi, Sep 23 '18 at 21:24
Of course I can, this is it: https://www.youtube.com/channel/UCtBGKF3uQNybKeelFz4PolA — YoungBoi, Sep 23 '18 at 22:31
The page is rendered with JavaScript the HTML code you want is not present in "print (soup)". There are loads of XHRs fetching data from other URLs. To scrape JavaScript rendered websites see my answer to https://stackoverflow.com/questions/45259232/scraping-google-finance-beautifulsoup/45259523#45259523 — Dan-Dev, Sep 24 '18 at 08:40

score 0 · Accepted Answer · answered Sep 24 '18 at 01:40

Though the element shown in web page displays element as this,

<a id="video-title" class="yt-simple-endpoint style-scope ytd-video-renderer" aria-label="VRACAMO SE MNOGO JACIII!!! by KaLuu Streamed 3 days ago 2 hours, 2 minutes 291 views" href="/watch?v=QPNienUChDg" title="VRACAMO SE MNOGO JACIII!!!">
                VRACAMO SE MNOGO JACIII!!!
              </a>

with requests the HTML element is getting changed to this.

    <a aria-describedby="description-id-904842" class="yt-uix-sessionlink yt-uix-tile-link spf-link yt-ui-ellipsis yt-ui-ellipsis-2" data-sessionlink="ei=6T6oW7LkMcKKowP9zqLoBQ&amp;feature=c4-overview&amp;ved=CDYQ-SUYACITCPL87M-_0t0CFULFaAodfacIXSibHA" dir="ltr" href="/watch?v=QPNienUChDg" rel="nofollow" title="VRACAMO SE MNOGO JACIII!!!">
VRACAMO SE MNOGO JACIII!!!
</a>

On observing you can see a change in class values from yt-simple-endpoint style-scope ytd-video-renderer to yt-uix-sessionlink yt-uix-tile-link spf-link yt-ui-ellipsis yt-ui-ellipsis-2. This happens in some websites based on the geography and some other reasons of the client end.

After identifying this i have got the value by the following code.

import requests
from bs4 import BeautifulSoup
url = 'https://www.youtube.com/channel/UCtBGKF3uQNybKeelFz4PolA'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
print(soup)
name = soup.find('a', {'class': 'yt-uix-sessionlink yt-uix-tile-link spf-link yt-ui-ellipsis yt-ui-ellipsis-2'})

print(name.text)

Hope this helps! Cheers!

Python - BeautifulSoup4 "None" Return?

1 Answers1