find_all() method from BeautifulSoup returns empty list

Question

I am new to programming and also new to pyhon.

My intension is to built an ebay webscraper.

I am trying to extract an list of links with the bs4 find_all() method, but no matter what I try, it returns always an empty list.

def get_index_data(soup):

    try:
        links = soup.find_all('a', {'class': 's-item__link'})
        print(links)
    except:
        links = []
        print(links)

I wrote it also that way.

links = soup.find_all('a', class_= 's-item__link')

It also returns an empty list. I absolutely don´t know what is wrong

Edit:

    import requests
from bs4 import BeautifulSoup


def get_page(url):

    response = requests.get(url)

    if not response.ok:
        print('server responded: ', response.status_code)
    else:
        soup = BeautifulSoup(response.text, 'lxml')
    return soup


def get_index_data(soup):
    links = soup.find_all('a')

    print(links)


def main():

    url = 'https://www.ebay.de/sch/i.html?_nkw=armbanduhr&_pgn=1 '
    get_index_data(get_page(url))


if __name__ == '__main__':
    main()

Edit2

Error after I run the code with only .find_all('a')

Traceback (most recent call last):
  File "C:\Users\Aleksandar\Desktop\My ebay scraper\test", line 29, in <module>
    main()
  File "C:\Users\Aleksandar\Desktop\My ebay scraper\test", line 25, in main
    get_index_data(get_page(url))
  File "C:\Users\Aleksandar\Desktop\My ebay scraper\test", line 19, in get_index_data
    print(links)
  File "C:\Users\Aleksandar\AppData\Local\Programs\Python\Python38\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2705' in position 28776: character maps to <undefined>

Can you share the page you want to scrap so we can replicate the issue? — Nico Müller, Jun 01 '20 at 19:18
Maybe the page you are trying to scrape uses dynamic Javascript, in which case you couldn't scrape it with BeautifulSoup. You could however use Selenium, if this is the case. — Mark Jacobsen, Jun 01 '20 at 19:24
The obvious answer is that there aren't any `` elements on that page. Can you show us `soup`? — John Gordon, Jun 01 '20 at 19:24
I mean in your code above, you have a variable named `soup`. Show us its contents. — John Gordon, Jun 01 '20 at 19:31
Great, but we also need to see what's in the `soup` variable, which you haven't shown us. — John Gordon, Jun 01 '20 at 19:34
No, that's still not it. `soup` has a bunch of html in it, right? Show us the html. — John Gordon, Jun 01 '20 at 20:28
I don't know exactly what the problem is, but it may lie somewhere you don't expect. What happens if you broaden your search and only find "a" elements with soup.find_all('a') Can you iterate through them and find the class you are looking for? — Davis Burnside, Jun 01 '20 at 19:25
No, the code doesn´t run properly if i do this. Shall I post the hole code? Its not to big — radna_skela, Jun 01 '20 at 19:29
This thread seems to address a similar problem: https://stackoverflow.com/questions/27092833/unicodeencodeerror-charmap-codec-cant-encode-characters — Davis Burnside, Jun 01 '20 at 19:37
Try specifying the encoding: add the ".encode('utf-8')" to the end of your **find** command, like this: soup.find('a').encode('utf-8') — Davis Burnside, Jun 01 '20 at 19:41
Yes, he seems to have a similar problem. But where could I write .io in my code? — radna_skela, Jun 01 '20 at 19:43
I am curious about your inclusion of LXML in the line "soup = BeautifulSoup(response.text, 'lxml')" Can you remove the LXML argument and try again? — Davis Burnside, Jun 01 '20 at 19:55
I did. Nothing happens, just a warning that they use lxml by default instead and still an empty list — radna_skela, Jun 01 '20 at 19:59
@JohnGordon do you mean which parser? Sorry if I seem a little confused, new to this =D but thank you for your patience — radna_skela, Jun 02 '20 at 02:47

score 0 · Accepted Answer · edited Mar 15 '21 at 22:48

0

Your code do not show us the url that you are trying to parse.

Please... Try to undestand the Concepts parsing one simple page...

Ebay uses JavaScript and it is a little bit harder to scrape...

I will write down a simple one...

Hope that help you to understand some concepts...

from bs4 import BeautifulSoup
import requests

page = "https://en.wikipedia.org/wiki/Main_Page"

page_text = requests.get(page).text

soup = BeautifulSoup(page_text, 'lxml')

# print(soup)
links = []
links = soup.find_all("a")

for link in links:
    print(link)

edited Mar 15 '21 at 22:48

DisappointedByUnaccountableMod

6,656
4
18
22

answered Jun 01 '20 at 19:40

Andre Nevares

711
6
21

Thank you! But the URL is in the code, at the bottom. It´s save as the variable url. Here the page https://www.ebay.de/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=armbanduhr&_sacat=0&LH_TitleDesc=0&_odkw=armbanduhr – radna_skela Jun 01 '20 at 19:48
@radna_skela My fault! I do not see the whole code! You have made a good post for your question! My bad! – Andre Nevares Jun 01 '20 at 19:52
No, thank you, really. Your Example works perfectly fine. But the code i use somehow not, it is strange – radna_skela Jun 01 '20 at 19:56
The problem is that the page you are trying to parse is render bu JavaScript. And the this JavaScript runs on the browser. I had the same problems trying to parse a youtube Channel. Try this post: https://stackoverflow.com/questions/61991229/how-to-load-and-parse-whole-content-of-a-dynamic-page-that-use-infinity-scroll – Andre Nevares Jun 01 '20 at 20:27
Yes, I think you are right because your example worked. Do you have some advice for scraping from JS sites? Thank you! – radna_skela Jun 02 '20 at 02:46
I think you need to parse any Kind of data from the page you are trying to parte. When the script retorn any kind of data, suco an ```h3``` you will be ready to take the next step. For the community help you faster you can post a New one with an specific title: how to parse dynamic Page with Java Script – Andre Nevares Jun 02 '20 at 05:30

score 0 · Answer 2 · answered Jun 01 '20 at 23:16

BeautifulSoup has a few different types of parsers for different situations. In the past I have stuck with the "html.parser" instead of "lxml". Sometimes using "lxml" will actually return None in a situation where "html.parser" will return a result.

That could be why you get your error messages and the empty result, i'd try that. When I wrote up something similar to yours it worked. Since the a tag is used alot you're probably going to get a huge chunk of stuff to parse through but if you change from lxml to html.parser it should work!

Web scraping can be tough to get the hang of but once you do it's really fun to do. There are really great videos that talk about beautifulsoup on Youtube.

I tried html.parser, but it still doesn´t work unfortunately, I believe it´s because of Javascript in ebay... But webscraping is great and I can also say that these youtube videos are great =D — radna_skela, Jun 02 '20 at 02:45

find_all() method from BeautifulSoup returns empty list

2 Answers2