How to selectively scrape html with repeated class IDs

Question

I am new to python and have searched stackoverflow in vain for an answer that I can understand. Thanks in advance for any help or advice you can give.

I am trying to scrape information on price and location from a housing sales website, i.e. the information with the'field-content' tag.

The problem is that the page has lots of 'field-content' tags and the primative code I am trying pulls and prints a seemingly random selection of them out.

thanks in advance for any help.

Here's what I am trying to scrape:

<div class="view-content">
<div class="views-row views-row-1 views-row-odd views-row-first views-row-last">
        <div class="views-field views-field-field-summary">        
<div class="field-content">
Land for sale in Prestatyn, Flintshire. Three acres of land with outline planning permission for three large, 4 bedroomed detached houses.
</div> 
 </div>  
         <div class="views-field views-field-field-price">    
<span class="views-label views-label-field-price">PRICE: </span>   
 <span class="field-content">£297,500</span>  
</div>

Here is my basic attempt at trying to get it to give me back the price. Haven't got very far and things like scraping for more than just price and saving it to a scraper wiki table are a long way off yet!

#!/usr/bin/env python

from lxml import html
import requests

page = requests.get('http://www.plotfinder.net/plot/plot-jaslin')
tree = html.fromstring(page.content)

Type1 = tree.xpath('//span[@class="views-label views-label-field-price"]/text()')
price = tree.xpath('//span[@class="field-content"]/text()')

print 'Type1: ', Type1
print 'price: ', price

score 0 · Answer 1 · answered Dec 04 '15 at 16:22

U can try this

from lxml import html
import requests

page = requests.get('http://www.plotfinder.net/plot/plot-jaslin')
tree = html.fromstring(page.content)

Type1 = tree.xpath('//span[contains(@class,"field-price"]/text()')
price = tree.xpath('//span[contains(@class,"field-price")]/following-sibling::span[contains(@class,"field-content")][1]/text()')


print 'Type1: ', Type1
print 'price: ', price

Hope u will get the result what u want.

How to selectively scrape html with repeated class IDs

1 Answers1