1

I am trying to get the href of anchor tag of the very first video search on YouTube using Beautiful Soup. I am searching it by using the "a" and class_="yt-simple-endpoint style-scope ytd-video-renderer". But I am getting None output:

from bs4 import BeautifulSoup
import requests    

source = requests.get("https://www.youtube.com/results?search_query=MP+election+results+2018%3A+BJP+minister+blames+conspiracy+as+reason+while+losing").text

soup = BeautifulSoup(source,'lxml')

# print(soup2.prettify())


a =soup.findAll("a", class_="yt-simple-endpoint style-scope ytd-video-renderer")

a_fin = soup.find("a", class_="compact-media-item-image")

#
print(a)
Nimantha
  • 6,405
  • 6
  • 28
  • 69
Siddhant Kaushal
  • 385
  • 1
  • 3
  • 13
  • Possible duplicate of [BeautifulSoup getting href](https://stackoverflow.com/questions/5815747/beautifulsoup-getting-href) – Yugandhar Chaudhari Jan 04 '19 at 11:50
  • there is no `class="yt-simple-endpoint style-scope ytd-video-renderer"` in the html source you get from `requests.get()`. That's why you get None – chitown88 Jan 04 '19 at 12:08
  • Possible duplicate of [retrieve links from web page using python and BeautifulSoup](https://stackoverflow.com/questions/1080411/retrieve-links-from-web-page-using-python-and-beautifulsoup) – Raidri Jan 04 '19 at 15:40

5 Answers5

1
from bs4 import BeautifulSoup
import requests    

source = requests.get("https://www.youtube.com/results?search_query=MP+election+results+2018%3A+BJP+minister+blames+conspiracy+as+reason+while+losing").text

soup = BeautifulSoup(source,'lxml')
first_serach_result_link = soup.findAll('a',attrs={'class':'yt-uix-tile-link'})[0]['href']

heavily inspired by this answer

matyas
  • 2,696
  • 23
  • 29
1

Another option is to render the page first with Selenium.

import bs4 
from selenium import webdriver 

url = 'https://www.youtube.com/results?search_query=MP+election+results+2018%3A+BJP+minister+blames+conspiracy+as+reason+while+losing'

browser = webdriver.Chrome('C:\chromedriver_win32\chromedriver.exe')
browser.get(url)

source = browser.page_source

soup = bs4.BeautifulSoup(source,'html.parser')

hrefs = soup.find_all("a", class_="yt-simple-endpoint style-scope ytd-video-renderer")
for a in hrefs:
    print (a['href'])

Output:

/watch?v=Jor09n2IF44
/watch?v=ym14AyqJDTg
/watch?v=g-2V1XJL0kg
/watch?v=eeVYaDLC5ik
/watch?v=StI92Bic3UI
/watch?v=2W_4LIAhbdQ
/watch?v=PH1WZPT5IKw
/watch?v=Au2EH3GsM7k
/watch?v=q-j1HEnDn7w
/watch?v=Usjg7IuUhvU
/watch?v=YizmwHibomQ
/watch?v=i2q6Fm0E3VE
/watch?v=OXNAMyEvcH4
/watch?v=vdcBtAeZsCk
/watch?v=E4v2StDdYqs
/watch?v=x7kCuRB0f7E
/watch?v=KERtHNoZrF0
/watch?v=TenbA4wWIJA
/watch?v=Ey9HfjUyUvY
/watch?v=hqsuOT0URJU
chitown88
  • 27,527
  • 4
  • 30
  • 59
1

It dynamic html you can use Selenium or to get static html use GoogleBot user-agent

headers = {'User-Agent' : 'Googlebot/2.1 (+http://www.google.com/bot.html)'}
source = requests.get("https://.......", headers=headers).text

soup = BeautifulSoup(source, 'lxml')

links = soup.findAll("a", class_="yt-uix-tile-link")
for link in links:
    print(link['href'])
ewwink
  • 18,382
  • 2
  • 44
  • 54
1

Try looping over the matches:

import urllib2
data = urllib2.urlopen("some_url")
html_data = data.read()
soup = BeautifulSoup(html_data)

for a in soup.findAll('a',href=True):
    print a['href']
officialrahulmandal
  • 2,473
  • 1
  • 23
  • 31
0

The class which you're searching does not exist in the scraped html. You can identify it by printing the soup variable. For example:

a =soup.findAll("a", class_="sign-in-link")

gives output as:

[<a class="sign-in-link" href="https://accounts.google.com/ServiceLogin?passive=true&amp;continue=https%3A%2F%2Fwww.youtube.com%2Fsignin%3Faction_handle_signin%3Dtrue%26app%3Ddesktop%26feature%3Dplaylist%26hl%3Den%26next%3D%252Fresults%253Fsearch_query%253DMP%252Belection%252Bresults%252B2018%25253A%252BBJP%252Bminister%252Bblames%252Bconspiracy%252Bas%252Breason%252Bwhile%252Blosing&amp;uilel=3&amp;hl=en&amp;service=youtube">Sign in</a>]
Nimantha
  • 6,405
  • 6
  • 28
  • 69
Sabesh
  • 310
  • 1
  • 11