0

I wanted to get .text of , but it simply doesn't work.

Here is my code:

from bs4 import BeautifulSoup
import request

generatedLink = "MyLink"
page = requests.get(generatedLink)
contents = page.text
soup = BeautifulSoup(contents, "html.parser")
name = soup.find('a',class_=["yt-simple-endpoint", "style-scope", "ytd-video-renderer"])

print(name)

And it returns "None"

<a id="video-title" class="yt-simple-endpoint style-scope ytd-video-renderer" aria-label="TURNIR 1 VS 1 U LOLU FINALEE!! od korisnika KaLuu Vrijeme streaminga: prije 3 dana 3 sata i 49 minuta 644 pregleda" href="/watch?v=5N4X4hjkzOw" title="TURNIR 1 VS 1 U LOLU FINALEE!!">

TURNIR 1 VS 1 U LOLU FINALEE!!

</a>

I need to extract text from that title up here!

Something is wrong here, but I can't find that error in code. Can someone help me ?

YoungBoi
  • 15
  • 4
  • Is the page rendered with JavaScript? – Dan-Dev Sep 23 '18 at 20:38
  • What do you mean by that ? – YoungBoi Sep 23 '18 at 21:11
  • If you print (soup) do you see your HTML you are trying to select or is it added dynamically with JavaScript using AJAX? If you right click in Firefox -> Inspect Element -> Network select XHR and reload the page you are trying to scrape does it show any XHRs? Could you try the same for WS instead of XHR too? – Dan-Dev Sep 23 '18 at 21:17
  • When i print(soup), I can see my "part of code" that I wanna "scrape" out of everything. – YoungBoi Sep 23 '18 at 21:24
  • Can you post the value of "MyLink"? – Dan-Dev Sep 23 '18 at 22:20
  • Of course I can, this is it: https://www.youtube.com/channel/UCtBGKF3uQNybKeelFz4PolA – YoungBoi Sep 23 '18 at 22:31
  • The page is rendered with JavaScript the HTML code you want is not present in "print (soup)". There are loads of XHRs fetching data from other URLs. To scrape JavaScript rendered websites see my answer to https://stackoverflow.com/questions/45259232/scraping-google-finance-beautifulsoup/45259523#45259523 – Dan-Dev Sep 24 '18 at 08:40

1 Answers1

0

Though the element shown in web page displays element as this,

<a id="video-title" class="yt-simple-endpoint style-scope ytd-video-renderer" aria-label="VRACAMO SE MNOGO JACIII!!! by KaLuu Streamed 3 days ago 2 hours, 2 minutes 291 views" href="/watch?v=QPNienUChDg" title="VRACAMO SE MNOGO JACIII!!!">
                VRACAMO SE MNOGO JACIII!!!
              </a>

with requests the HTML element is getting changed to this.

    <a aria-describedby="description-id-904842" class="yt-uix-sessionlink yt-uix-tile-link spf-link yt-ui-ellipsis yt-ui-ellipsis-2" data-sessionlink="ei=6T6oW7LkMcKKowP9zqLoBQ&amp;feature=c4-overview&amp;ved=CDYQ-SUYACITCPL87M-_0t0CFULFaAodfacIXSibHA" dir="ltr" href="/watch?v=QPNienUChDg" rel="nofollow" title="VRACAMO SE MNOGO JACIII!!!">
VRACAMO SE MNOGO JACIII!!!
</a>

On observing you can see a change in class values from yt-simple-endpoint style-scope ytd-video-renderer to yt-uix-sessionlink yt-uix-tile-link spf-link yt-ui-ellipsis yt-ui-ellipsis-2. This happens in some websites based on the geography and some other reasons of the client end.

After identifying this i have got the value by the following code.

import requests
from bs4 import BeautifulSoup
url = 'https://www.youtube.com/channel/UCtBGKF3uQNybKeelFz4PolA'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
print(soup)
name = soup.find('a', {'class': 'yt-uix-sessionlink yt-uix-tile-link spf-link yt-ui-ellipsis yt-ui-ellipsis-2'})

print(name.text)

Hope this helps! Cheers!

SanthoshSolomon
  • 1,383
  • 1
  • 14
  • 25