2

I'm trying to scrape the following HTML:

<select id="sizeShoe" name="attributes[&#39;size&#39;]" class="selectFld col-xs-12">
<option value="">Select Size</option>
<option value="025">2.5</option>
<option value="035">3.5</option>
<option value="040">4</option>
<option value="045">4.5</option>
<option value="050">5</option>
<option value="055">5.5</option>
<option value="060">6</option>
<option value="065">6.5</option>
<option value="070">7</option>
<option value="075">7.5</option>
<option value="080">8</option>
<option value="085" selected="selected">8.5</option>
<option value="090">9</option>
                        </select>

I need to create a dictionary with the following value:

argument = {"2.5":"025", "3.5":"035, "4":"040" ecc...}

My attempt:

soup = BeautifulSoup(response.text, "lxml")
soup.prettify()

argument = {}
sizeShoe = soup.find("select", attrs={'id' : 'sizeShoe'})
for a in sizeShoe:
   valor = sizeShoe.get("value")

But the result of valor is None.

How can I scrape the data and save it as dictionary? And there is a library faster than BeautifulSoup?

MendelG
  • 14,885
  • 4
  • 25
  • 52

3 Answers3

1

Is there a faster library than BeautifulSoup?

Check out Scrapy. See Difference between BeautifulSoup and Scrapy crawler?


Try the following code to scrape the data to a dictionary:

from bs4 import BeautifulSoup, NavigableString

html = '''YOUR ABOVE CODE SNIPPET'''

soup = BeautifulSoup(html, 'lxml')

shoe_size = soup.select_one('#sizeShoe')

# Check that 'tag' is not an instance of 'NavigableString'
# Check that the value of 'value' is not an empty string

argument = {
    tag.text: tag['value']
    for tag in shoe_size
    if not isinstance(tag, NavigableString) and tag['value']
}

print(argument)

Output:

{'2.5': '025', '3.5': '035', '4': '040', '4.5': '045', '5': '050', '5.5': '055', '6': '060', '6.5': '065', '7': '070', '7.5':'075', '8': '080', '8.5': '085', '9': '090'}
MendelG
  • 14,885
  • 4
  • 25
  • 52
0

Find the code here :

from bs4 import BeautifulSoup

result_dict = {}
soup = BeautifulSoup(html_data, 'html.parser')
for option in soup.find_all('option'):
    if option['value'] != '':
        result_dict[option.text] = option['value']

result_dict:

{'2.5': '025', '3.5': '035', '4': '040', '4.5': '045', '5': '050', '5.5': '055', '6': '060', '6.5': '065', '7': '070', '7.5': '075', '8': '080', '8.5': '085', '9': '090'}

Anjaly Vijayan
  • 237
  • 2
  • 9
-1

You have to use soup.find_all() instead of soup.find(). bs4 is the best there is.