3

Tried parsing a web page. Faced ::before in Page html

url = 'https://kant-sport.ru/sports/skiing/svobodnoe-katanie/'

# Getting whole page
page = get(url)

# Making soup
soup = BS(page.content, 'html.parser')

# Getting table
table = soup.select('.new-tables-content')[0]

# Getting table's rows and getting rid of first unneeded row
rows = table.select('.new-tables-content-row')[1:]

Then I need to get 'x' symbol

# Getting 'x' symbol by class
print(rows[0].find(class_="new-tables-content-col::before"))

Output

None

And using select method (css selector)

# Getting 'x' symbol by css
print(rows[0].select('.new-tables-content-row:not(.new-tables-content-header) .new-tables-content-col:last-child:before'))

Output

Traceback (most recent call last):
  File "E:/Coding/PycharmProjects/kant-monitoring-bot/parser.py", line 36, in <module>
    print(rows[0].select('.new-tables-content-row:not(.new-tables-content-header) .new-tables-content-col:last-child:before'))
  File "E:\Coding\PycharmProjects\kant-monitoring-bot\venv\lib\site-packages\bs4\element.py", line 1869, in select
    results = soupsieve.select(selector, self, namespaces, limit, **kwargs)
  File "E:\Coding\PycharmProjects\kant-monitoring-bot\venv\lib\site-packages\soupsieve\__init__.py", line 98, in select
    return compile(select, namespaces, flags, **kwargs).select(tag, limit)
  File "E:\Coding\PycharmProjects\kant-monitoring-bot\venv\lib\site-packages\soupsieve\__init__.py", line 62, in compile
    return cp._cached_css_compile(pattern, namespaces, custom, flags)
  File "E:\Coding\PycharmProjects\kant-monitoring-bot\venv\lib\site-packages\soupsieve\css_parser.py", line 208, in _cached_css_compile
    CSSParser(pattern, custom=custom_selectors, flags=flags).process_selectors(),
  File "E:\Coding\PycharmProjects\kant-monitoring-bot\venv\lib\site-packages\soupsieve\css_parser.py", line 1043, in process_selectors
    return self.parse_selectors(self.selector_iter(self.pattern), index, flags)
  File "E:\Coding\PycharmProjects\kant-monitoring-bot\venv\lib\site-packages\soupsieve\css_parser.py", line 902, in parse_selectors
    has_selector, is_html = self.parse_pseudo_class(sel, m, has_selector, iselector, is_html)
  File "E:\Coding\PycharmProjects\kant-monitoring-bot\venv\lib\site-packages\soupsieve\css_parser.py", line 640, in parse_pseudo_class
    "'{}' pseudo-class is not implemented at this time".format(pseudo)
NotImplementedError: ':before' pseudo-class is not implemented at this time

Process finished with exit code 1

How to properly parser elements with ::before or ::after

TimNekk
  • 33
  • 4

2 Answers2

1

As the author of soupsieve (the select library used in BeautifulSoup), I can answer this question. You cannot use ::before to parse pseudo-elements.

For one, pseudo-elements are not real elements. A browser, when rendering the source, may create these pseudo elements, but they do not exist in the source, only the rendered implementation. BeautifulSoup does not render HTML, it simply parsers it; therefore, there are no pseudo-elements. If you print the HTML source (after parsing it with BeautifulSoup) you will find that there are no ::before elements in the document structure.

Additionally, soupsieve does not at this time support any pseudo-element selectors. It supports many selectors, and a very large amount of pseudo-classes, but it supports no psuedo-elements at all.

facelessuser
  • 1,656
  • 1
  • 13
  • 11
0

As the error suggests, this is not yet implemented in the parser you're using.

To deal with this, you can use a parser that supports this, like the lxml parser.

pip3 install --upgrade lxml bs4
soup = BeautufulSoup(page.content, 'lxml')

Also ensure you're using Python 3 and the newest version of bs4

sytech
  • 29,298
  • 3
  • 45
  • 86
  • Switched parser to 'lxml' and still getting NotImplementedError https://i.imgur.com/0oFp6f5.png – TimNekk Nov 15 '20 at 10:55