I've been searching for a solution to this AttributeError I keep getting, and no solution I've been able to find deals with '_all_strings'.
I want to code a web-crawler, but there's a lot of nonsense at the top and bottom of the page, so I'm trying to clean up the HTML code as a precursor to excluding the unwanted noise at the top and bottom of the webpage.
When I run the code below, specifically, the last line of it, I get an AttributeError:
from __future__ import division
from urllib.request import urlopen
from bs4 import BeautifulSoup
textSource = 'http://celt.ucc.ie/irlpage.html'
html = urlopen(textSource).read()
raw = BeautifulSoup.get_text(html)
This is the full Traceback I get:
Traceback (most recent call last):
File "...Crawler_Celt_Namelink_Test.py", line 7, in <module>
raw = BeautifulSoup.get_text(html)
File "...Python\Python35\lib\site-packages\bs4\element.py", line 950, in get_text
return separator.join([s for s in self._all_strings(
AttributeError: 'bytes' object has no attribute '_all_strings'
Has anybody encountered this error before? Or can anyone suggest how I can overcome it, please?