1

I am trying to extract anchor tag inside ... (screenshot attached below) with BeautifulSoup but getting empty list while only anchor tag is working.

I read the BeautifulSoup documentation and tried the select() method and find_all() method but still giving empty list.

>>> import requests, webbrowser, bs4
>>> res = requests.get('https://www.google.com/search?q=beautiful+soup')
>>> soup = bs4.BeautifulSoup(res.text, 'html.parser')
>>> elems = soup.select('.r a')
>>> len(elems)
0
>>> elems = soup.select('a')
>>> len(elems)
68
>>> elems = soup.select('.r')
>>> len(elems)
0
>>> soup.find_all('a', class_='r')
[]
>>> soup.select('[class~=r]')
[]
>>> soup.find_all('a', class_='r')
[]
>>> soup.find_all('a', _class='r')
[]
>>> soup.find_all('a', {'class_':'r'})
[]
>>> soup.find_all('a', {'_class':'r'})
[]

Div with as class r

Abhijit
  • 23
  • 6
  • This question has been asked lots of times before. Have you tried some of the previous solutions on SO? – QHarr Sep 07 '19 at 10:06
  • Why using _ before class ? Did you try : soup.find_all('a', {'class':'r'}) ? – A. STEFANI Sep 07 '19 at 10:11
  • @ASTEFANI using _ before class as I read in one answer where using _ before class, issue is resolved. But this doc [link](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-by-css-class) says to use _ after class. As you suggested, used only class still list is empty. As Dev Khadka said below, looks like Google blocks scrapping. I tested with other site and working fine. – Abhijit Sep 07 '19 at 18:26
  • @QHarr yes checked the answers but still same issue. But the problem here seems Google blocks scrapping. My code working with other sites. – Abhijit Sep 07 '19 at 18:28

1 Answers1

1

It looks like google.com generates class name randomly, may be to discourage scraping. Your code works on other site

import requests, webbrowser, bs4
res = requests.get('https://html.com')
soup = bs4.BeautifulSoup(res.text, 'html.parser')
elems = soup.select('.post-single p')
len(elems)


Dev Khadka
  • 5,142
  • 4
  • 19
  • 33