2

New to this library (no more familiar with BeautifulSoup either, sadly), trying to do something very simple (search by inline style):

<td style="padding: 20px">blah blah </td>

I just want to select all tds where style="padding: 20px", but I can't seem to figure it out. All the examples show how to select td, such as:

for col in page.cssselect('td'):

but that doesn't help me much.

ropa
  • 23
  • 3

3 Answers3

4

Well, there's a better way: XPath.

import lxml.html
data = """<td style="padding: 20px">blah blah </td>
<td style="padding: 21px">bow bow</td>
<td style="padding: 20px">buh buh</td>
"""
doc = lxml.html.document_fromstring(data)
for col in doc.xpath("//td[@style='padding: 20px']"):
    print col.text

That is neater and also faster.

nosklo
  • 217,122
  • 57
  • 293
  • 297
3

If you prefer to use CSS selectors:

import lxml.html
data = """<td style="padding: 20px">blah blah </td>
<td style="padding: 21px">bow bow</td>
<td style="padding: 20px">buh buh</td>
"""
doc = lxml.html.document_fromstring(data)
for td in doc.cssselect('td[style="padding: 20px"]'):
   print td.text
Ruslan Spivak
  • 1,700
  • 1
  • 11
  • 5
2

Note that both Ruslan Spivak and nosklo have given better answers below.


import lxml.html
data = """<td style="padding: 20px">blah blah </td>
<td style="padding: 21px">bow bow</td>
<td style="padding: 20px">buh buh</td>
"""
doc = lxml.html.document_fromstring(data)
for col in doc.cssselect('td'):
    style = col.attrib['style']
    if style=='padding: 20px':
        print(col.text.strip())

prints

blah blah
buh buh

and manages to skip bow bow.

Community
  • 1
  • 1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Thanks! Now all I need is for lxml to actually install on a windows machine, and I'm golden! – ropa Apr 12 '10 at 02:45
  • Why `document_fromstring` not just `fromstring`? – nn0p Aug 25 '15 at 04:26
  • 1
    @nn0p: `document_fromstring` returns a `HtmlElement` which begins with ``, `fromstring` returns a `HtmlElement` which begins with `
    `. In this case it does not matter.
    – unutbu Aug 25 '15 at 10:31