2

I'm running into an issue when trying to get the parent node of a tr element whilst iterating through them all.

Here's a basic table that I'm working with.

 <table border=1>
    <tbody>
    <tr>
    <td>
    <p>Some text</p>
    </td>
    <td>
    <p>Some more text</p>
    </td>
    </tr>
    <tr>
    <td>
    <p> Some more text</p>
    </td>
    <td>
    <p> Some more text</p>
    </td>
    </tr>
    <tr>
    <td>
    <p> Some more text</p>
    </td>
    <td>
    <p> Some more text</p>
    </td>
    </tr>
    </tbody>
    </table>

And here's my Python script to get the parent node using lxml

import lxml.html

htm = lxml.html.parse('plaintable.htm')
tr = htm.xpath('//tr')
for x in tr:
    tbody = tr.getparent()
    if tbody.index(tr) == 1:
        print ('Success!')
print ('Finished')

I'm getting this error when I run the script: AttributeError: 'list' object has no attribute 'getparent'

I'm quite new to Python so it could be something simple I'm messing up. I read through the lxml documents and I couldn't find an answer.

Any help would be great!

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
Chad
  • 35
  • 1
  • 8

1 Answers1

2

tr is actually a list of xpath matches. x corresponds to individual tr elements - call getparent() method on it instead:

tr = htm.xpath('//tr')
for x in tr:
    tbody = x.getparent()
    # ...

Though, I don't see much sense in getting the same parent over and over again in a loop, in case you have a single table and tbody element. Why don't locate it beforehand:

tbody = htm.xpath("//tbody")[0]
for x in tbody.xpath(".//tr"):
    # ...

I need to find the first tr in every table to build it properly

As for this - I would iterate over all table elements and find the first tr element:

tables = htm.xpath("//table")
for table in tables:
    first_tr = table.xpath(".//tr")[0]
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Ah perfect! I was misunderstanding how to call the index in the for loop. I'm used to JavaScript so was having a tough time. In a nutshell, I'm building an XML file based on the tables in the HTML document. I'm currently working with 36 different tables. I need to find the first tr in every table to build it properly – Chad Jul 09 '16 at 21:25
  • @Chad got it, also updated with a sample code for the "I need to find the first tr in every table to build it properly" part. Thanks! – alecxe Jul 09 '16 at 21:27
  • Awesome, that'll be tremendously helpful with creating the parent nodes. I really appreciate your help! – Chad Jul 09 '16 at 21:30