0

I'm following the example code found here. The author has some documentation where he list some steps that used to write the program. When I run the whole program together it runs perfectly but when I follow the steps he's put I get an AttributeError.

Here's my code

pdf = pdfquery.PDFQuery("Aberdeen_2015_1735t.pdf")
pdf.load()
pdf.tree.write("test3.xml", pretty_print=True, encoding="utf-8")

sept = pdf.pq('LTPage[pageid=\'1\'] LTTextLineHorizontal:contains("SEPTEMBER")')
print(sept.text())

x = float(sept.get('x0'))
y = float(sept.get('y0'))
cells = pdf.extract( [
     ('with_parent','LTPage[pageid=\'1\']'),
     ('cells', 'LTTextLineHorizontal:in_bbox("%s,%s,%s,%s")' % (x, y, x+600, y+20))
])

Everything runs fine until it gets to "sept.get" where it says that "'PyQuery' object has no attribute 'get'." Does anyone know why the program wouldn't encounter this error when it's run all together but it occurs when a piece of the code is run?

otteheng
  • 594
  • 1
  • 9
  • 27

1 Answers1

0

According to the PyQuery API reference, a PyQuery object indeed doesn't have a get member. The code example must be obsolete.

According to https://pypi.python.org/pypi/pdfquery, attributes are retrieved with .attr:

x = float(sept.attr('x0'))

Judging by the history of pyquery's README.rst, get was never documented and only worked due to some side effect (some delegation to a dict, perhaps).

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
  • Do you know when the API was last updated? I ran the example code last week and it worked, so unless they dropped the .get within the past week I'm still at a loss as to why it doesn't work. – otteheng Feb 29 '16 at 20:43
  • @otteheng then you're in a better position than me: only you know which version you ran last week. I cannot find anything relevant in git history. – ivan_pozdeev Mar 01 '16 at 15:13
  • You were right about `get`. I changed it to `attr` and it worked. Must be something to do with `dict`. Thanks! – otteheng Mar 01 '16 at 18:58