1

I am new to programming and also new to pyhon.

My intension is to built an ebay webscraper.

I am trying to extract an list of links with the bs4 find_all() method, but no matter what I try, it returns always an empty list.

def get_index_data(soup):

    try:
        links = soup.find_all('a', {'class': 's-item__link'})
        print(links)
    except:
        links = []
        print(links)

I wrote it also that way.

links = soup.find_all('a', class_= 's-item__link')

It also returns an empty list. I absolutely don´t know what is wrong

Edit:

    import requests
from bs4 import BeautifulSoup


def get_page(url):

    response = requests.get(url)

    if not response.ok:
        print('server responded: ', response.status_code)
    else:
        soup = BeautifulSoup(response.text, 'lxml')
    return soup


def get_index_data(soup):
    links = soup.find_all('a')

    print(links)


def main():

    url = 'https://www.ebay.de/sch/i.html?_nkw=armbanduhr&_pgn=1 '
    get_index_data(get_page(url))


if __name__ == '__main__':
    main()

Edit2

Error after I run the code with only .find_all('a')


Traceback (most recent call last):
  File "C:\Users\Aleksandar\Desktop\My ebay scraper\test", line 29, in <module>
    main()
  File "C:\Users\Aleksandar\Desktop\My ebay scraper\test", line 25, in main
    get_index_data(get_page(url))
  File "C:\Users\Aleksandar\Desktop\My ebay scraper\test", line 19, in get_index_data
    print(links)
  File "C:\Users\Aleksandar\AppData\Local\Programs\Python\Python38\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2705' in position 28776: character maps to <undefined>
0m3r
  • 12,286
  • 15
  • 35
  • 71

2 Answers2

0

Your code do not show us the url that you are trying to parse.

Please... Try to undestand the Concepts parsing one simple page...

Ebay uses JavaScript and it is a little bit harder to scrape...

I will write down a simple one...

Hope that help you to understand some concepts...

from bs4 import BeautifulSoup
import requests

page = "https://en.wikipedia.org/wiki/Main_Page"

page_text = requests.get(page).text

soup = BeautifulSoup(page_text, 'lxml')

# print(soup)
links = []
links = soup.find_all("a")

for link in links:
    print(link)
Andre Nevares
  • 711
  • 6
  • 21
  • Thank you! But the URL is in the code, at the bottom. It´s save as the variable url. Here the page https://www.ebay.de/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=armbanduhr&_sacat=0&LH_TitleDesc=0&_odkw=armbanduhr – radna_skela Jun 01 '20 at 19:48
  • @radna_skela My fault! I do not see the whole code! You have made a good post for your question! My bad! – Andre Nevares Jun 01 '20 at 19:52
  • No, thank you, really. Your Example works perfectly fine. But the code i use somehow not, it is strange – radna_skela Jun 01 '20 at 19:56
  • The problem is that the page you are trying to parse is render bu JavaScript. And the this JavaScript runs on the browser. I had the same problems trying to parse a youtube Channel. Try this post: https://stackoverflow.com/questions/61991229/how-to-load-and-parse-whole-content-of-a-dynamic-page-that-use-infinity-scroll – Andre Nevares Jun 01 '20 at 20:27
  • Yes, I think you are right because your example worked. Do you have some advice for scraping from JS sites? Thank you! – radna_skela Jun 02 '20 at 02:46
  • I think you need to parse any Kind of data from the page you are trying to parte. When the script retorn any kind of data, suco an ```h3``` you will be ready to take the next step. For the community help you faster you can post a New one with an specific title: how to parse dynamic Page with Java Script – Andre Nevares Jun 02 '20 at 05:30
0

BeautifulSoup has a few different types of parsers for different situations. In the past I have stuck with the "html.parser" instead of "lxml". Sometimes using "lxml" will actually return None in a situation where "html.parser" will return a result.

That could be why you get your error messages and the empty result, i'd try that. When I wrote up something similar to yours it worked. Since the a tag is used alot you're probably going to get a huge chunk of stuff to parse through but if you change from lxml to html.parser it should work!

Web scraping can be tough to get the hang of but once you do it's really fun to do. There are really great videos that talk about beautifulsoup on Youtube.

eagleman21
  • 116
  • 3
  • I tried html.parser, but it still doesn´t work unfortunately, I believe it´s because of Javascript in ebay... But webscraping is great and I can also say that these youtube videos are great =D – radna_skela Jun 02 '20 at 02:45