1

How to parse HTML table using pyquery? [See Source code html table on http://pastie.org/pastes/8556919

Result: {

"category_1":{ "cat1_el1_label":"cat1_el1_value",},

"category_2":{"cat2_el1_label":"cat2_el1_value",},

"category_3":{"cat3_el1_label":"cat3_el1_value",}

}

Thank you very much.

user1667957
  • 81
  • 1
  • 1
  • 10
  • what have you done? What have you tried? Also, beautifulsoup is a great lib for this as well http://www.crummy.com/software/BeautifulSoup/ – Mike McMahon Dec 16 '13 at 23:51

2 Answers2

4

Simple way:

from pyquery import PyQuery
from collections import defaultdict

doc = PyQuery(html)
values = defaultdict(dict)
for tr in doc('tr').items():
    if tr('th.title'):
        title = tr('th.title').text()
    else:
        items = zip(tr('.properties_label').items(),
                    tr('.properties_value').items())
        values[title].update(dict([(k.text(), v.text()) for k, v in items]))

Result:

defaultdict(<type 'dict'>, {'Category_3': {'cat3_el1_label': 'cat3_el1_value'},
                            'Category_2': {'cat2_el1_label': 'cat2_el1_value'},
                            'Category_1': {'cat1_el1_label': 'cat1_el1_value'}})
gawel
  • 2,038
  • 14
  • 16
0

Something like this...Though I'm not sure how I feel about pyquery (try BeautifulSoup)

from pyquery import PyQuery as pq

>>> p = pq(html)
>>> p = d(".properties_label span")
>>> for x in p:
...   print x.text
...
cat1_el1_label
cat2_el1_label
cat3_el1_label

>>> p = d(".properties_value")
>>> for x in p:
...   print x.text
...
cat1_el1_value
cat2_el1_value
cat3_el1_value
frodopwns
  • 942
  • 9
  • 15