1

I've browsed the previous questions for an hour and tried various solutions but I can't get this to work. I've extracted the results I want from a website, now I just have to mine these divs for the specific information I want.

The results are isolated like so:

items=soup.findAll(id=re.compile("itembase"))

For each item, I want to extract for example the lat and long from this piece of html:

<div id="itembase29" class="result-item -result unselected clearfix even" data-
part="fl_base" data-lat="51.9006" data-lon="-8.51008" data-number="29" 
is-local="true" data-customer="32060963" data-addrid="1" 
data-id="4b00fae498e3cc370133e8a14fd75160">
<div class="arrow">
</div>

How do I do that? Thanks.

Jeremy
  • 1
  • 85
  • 340
  • 366
eamon1234
  • 1,555
  • 3
  • 19
  • 38

1 Answers1

2
  1. Pass your html object into beautiful soup.

    soup = BeautifulSoup(html)
    
  2. Find the div.

    div = soup.div
    
  3. Get the attributes you're looking for from the div.

    lat, lon = div.attrs['data-lat'], div.attrs['data-lon']
    
  4. Print.

    >>> print lat, lon
    51.9006 -8.51008
    

I left the .attrs method in there for clarity, but in more general terms, you can access the attributes of any element like a dictionary, you don't even really need the .attrs method, like so: div['data-lon']. This obviously doesnt work over a list of divs, you need to iterate over the list.

for div in divs:
    print div['data-lon'], div['data-lat']

Or list comprehension.

[(div['data-lon'], div['data-lat']) for div in divs]
kreativitea
  • 1,741
  • 12
  • 14
  • Thanks. This now works: for item in items: soup = BeautifulSoup(str(item)) div = soup.div print div['data-lon'],div['data-lat'] – eamon1234 Nov 13 '12 at 18:35