I am pretty new to python and I am trying to set up a webscraper that gathers data on characters who have died in the show Game of Thrones. I have gotten the data that I want but I can't seem to get some of the extra fluff out of the data.
I have tried the .strip()
method and the .replace()
method using .replace(" ", "")
but each time nothing changes. Here is a block of my code:
url = "http://time.com/3924852/every-game-of-thrones-death/"
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
# Find the characters who have died by searching for the text embedded within the <div> tag with class = "headline"
find_deaths = soup.find_all('div', class_="headline")
# Strip out all the extra fluff at the beginning and end of the text and add it to list
for hit in find_deaths:
deaths.append(hit.contents)
This code yields items in the list that look like this:
deaths = [['\n Will\n '], ['\n Jon Arryn\n '], ['\n Jory Cassel\n ']
I have tried the following methods in order to try to stip out the extra fluff surrounding the data but it doesn't change anything in the list at all.
for item in deaths:
str(item).strip()
for item in deaths:
str(item).replace("\n ", "")
Using either one of the two methods above I thought that it would strip all the extra fluff out from the items in the list but it doesn't seem to change anything at all.
Is there another method I could use besides strip and replace that will get rid of the extra fluff in this data.