0

XPath via lxml in Python has been making me run in circles. I can't get it to extract text from an HTML table despite having what I believe to be the correct XPath. I'm using Chrome to inspect and extract the XPath, then using it in my code.

Here is the HTML table taken directly from the page:

<div id="vehicle-detail-model-specs-container">
<table id="vehicle-detail-model-specs" class="table table-striped vdp-feature-table">
    <!-- Price -->
    <tr>
                <td><strong>Price:</strong></td>
                    <td>
                            <strong id="vehicle-detail-price" itemprop="price">$ 2,210.00</strong>            </td>
            </tr>
                    <!-- VIN -->
    <tr><td><strong>VIN</strong></td><td>&nbsp;*0343</td></tr>

    <!-- MILEAGE -->
    <tr><td><strong>Mileage</strong></td><td>0&nbsp;mi</td></tr>
</table>

I'm trying to extract the Mileage. The XPath I'm using is:

//*[@id="vehicle-detail-model-specs"]/tbody/tr[3]/td[2]

And the Python code that I'm using is:

page = requests.get(URL)
tree = html.fromstring(page.content)

mileage = tree.xpath('//*[@id="vehicle-detail-model-specs"]/tbody/tr[3]/td[2]')
print mileage

Note: I've tried adding /text() to the end and I still get nothing back, just an empty list [].

What am I doing wrong and why am I not able to extract the table value from the above examples?

K997
  • 429
  • 6
  • 19
  • 1
    Just out of curiosity, have you tried omitting the `tbody` from the xpath? – Amber Dec 31 '17 at 05:04
  • See https://stackoverflow.com/a/18241030/407651 – mzjn Dec 31 '17 at 08:05
  • Possible duplicate of [Why does my XPath query (scraping HTML tables) only work in Firebug, but not the application I'm developing?](https://stackoverflow.com/questions/18241029/why-does-my-xpath-query-scraping-html-tables-only-work-in-firebug-but-not-the) – Andersson Dec 31 '17 at 08:17
  • @amber - thank you, that + user2969402 solution below did the trick! And yes, the suggested duplicate provided lots of info as well. Thank you all! – K997 Dec 31 '17 at 13:27

1 Answers1

1

As Amber has pointed out, you should omit the tbody part. You use tbody in your xpath when there is no <tbody> tag in the html code for your table.

Using the html you posted, I am able to extract the mileage value with the following xpath:

tree.xpath('//*[@id="vehicle-detail-model-specs"]/tr[3]/td[2]')[0].text_content()
user2969402
  • 1,221
  • 3
  • 16
  • 26