3

I am attempting to scrape some info from a website. I was able to successfully scrape the text that i was looking for, but when I try to create a function to append the texts together, i get a TypeError of an unhashable type.

Do you know what may be happening here? Does anybody know how to fix this issue?

Here is the code in question:

records = []
for result in results:
    name = result.contents[0][0:-1]

and here is the code in entirety, for reproducing purposes:

import requests
from bs4 import BeautifulSoup

r = requests.get('https://skinsalvationsf.com/2012/08/updated-comedogenic-ingredients-list/')
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.find_all('td', attrs={'valign':'top'})

records = []
for result in results:
    name = result.contents[0][0:-1]

A sample of results items:

<td valign="top" width="33%">Acetylated Lanolin <sup>5</sup></td>,
<td valign="top" width="33%">Coconut Butter<sup> 8</sup></td>,
...
<td valign="top" width="33%"><sup> </sup></td>

Thanks in advance!!

t.m.adam
  • 15,106
  • 3
  • 32
  • 52
pynewbee
  • 665
  • 3
  • 9
  • 19
  • That error message almost always means that you think you have a list (or other sequence), but you actually have a dict (or other mapping). The error message could be a little nicer, but the point is: it doesn't make any sense to slice a range out of a dict. – abarnert May 03 '18 at 05:02
  • As for how to fix it… well, that depends on what's in `contents[0]`, what you thought was in there, and what you were hoping that `[0:-1]` to select. – abarnert May 03 '18 at 05:05
  • `>>> first_result = results[0] Traceback (most recent call last): File "", line 1, in NameError: name 'results' is not defined` This is what I get :| – Shubham May 03 '18 at 05:09
  • You haven't initialized `results` before using it as `results[0]`. – Shubham May 03 '18 at 05:10
  • Also, it would be much better as a [mcve] if you just took out the `requests` and `bs4` parts and just directly gave us `results = []`. – abarnert May 03 '18 at 05:12
  • @pynewbee Can you clarify the problem here? – Shubham May 03 '18 at 05:12
  • @shubham sorry about that. i updated the code to include results. results = soup.find_all('td', attrs={'valign':'top'}) – pynewbee May 03 '18 at 05:19
  • @abarnert the code has been updated to include the results. results = soup.find_all('td', attrs={'valign':'top'}) thank you! – pynewbee May 03 '18 at 05:22
  • 1
    When I run this, I get a 403 error from the web server, which of course means the soup ends up empty, results is empty, and your error cannot be reproduced. Again, it would be much better if you strip out the `requests` and `bs4` stuff and just give us a sample string for `results` that demonstrates the problem. Please read [mcve] if it's not clear how to narrow your problem down; it really is helpful. – abarnert May 03 '18 at 05:30
  • @abarnert my post is updated to show the sample string for results. thank you! – pynewbee May 03 '18 at 06:38
  • 1
    `results= Acetylated Lanolin 5` is not even valid code. – abarnert May 03 '18 at 06:47

2 Answers2

4

In some of your collected results the contents contains no text, but only Tag objects, so you get a TypeError when trying to select a slice from the Tag's attributes dictionary.

You can catch such errors with a try-except block,

for result in results:
    try:
        name = result.contents[0][0:-1]
    except TypeError:
        continue

Or you could use .strings to select only NavigableString contents,

for result in results:
    name = list(result.strings)[0][0:-1]

But it seems like it's only the last item that has no text content, so you could just ignore it.

results = soup.find_all('td', attrs={'valign':'top'})[:-1]

for result in results:
    name = result.contents[0][:-1]
t.m.adam
  • 15,106
  • 3
  • 32
  • 52
1

To understand why you're getting TypeError: unhashable type: 'slice' read t.m.adam's answer. In a nutshell in the last iteration result variables points to a bs4.element.Tag object instead of bs4.element.NavigableString.

Below is the working solution with try-except block used since the last 2 <td> elements in the list contain no "stripped_strings" and would produce ValueError: not enough values to unpack (expected 2, got 0).

Code: (Python 3.6+ if you want to use f-strings)

from bs4 import BeautifulSoup
import requests

url = 'https://skinsalvationsf.com/2012/08/updated-comedogenic-ingredients-list/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36'}
html = requests.get(url, headers=headers).text
soup = BeautifulSoup(r.text, 'html.parser')

tds = soup.find_all('td')
for td in tds:
    try:
        ingredient, rating = td.stripped_strings
    except ValueError:
        pass
    else:
        print(f'{ingredient} -> {rating}')

Output:

Acetylated Lanolin -> 5
Coconut Butter -> 8
...
Xylene -> 7
Octyl Palmitate -> 7

You could also get rid of the entire try-except-else and omit the last 2 <td>:

tds = soup.find_all('td')[:-2]
for td in tds:
    ingredient, rating = td.stripped_strings
    ...

However, the maintainers of the website might decide to add or delete some ingredients, causing the code to miss some ingredients.

radzak
  • 2,986
  • 1
  • 18
  • 27