Questions tagged [pyquery]

pyquery is a jquery-like library for python that allows you to make jquery queries on xml documents.

PyQuery uses lxml for fast XML and HTML manipulation.

It allows you to make jQuery-style CSS-selector queries on XML/HTML documents. The API is intended to match jQuery's API whenever possible, though it has been made more Pythonic where appropriate

It can be used for many purposes. The main idea is to use it for templating with pure http templates that you modify using pyquery. I can also be used for web scrapping or for theming applications with Deliverance.

Read more

97 questions
4
votes
1 answer

pyquery (lxml) not finding a tag in a well-structured XML document?

I have an XML file that looks like this. The relevant bit is this: Vander Wal JS, Gang CH, Griffing GT, Gadde KM. Escitalopram for treatment of night eating syndrome: a 12-week, randomized, placebo-controlled trial. J Clin…
Richard
  • 62,943
  • 126
  • 334
  • 542
4
votes
2 answers

How do I access the first item(or xth item) in a PyQuery query?

I have a query for a one of my tests that returns 2 results. Specifically the 3rd level of an outline found using query = html("ul ol ul") How do I select the first or second unordered list? query[0] decays to a…
Roman A. Taycher
  • 18,619
  • 19
  • 86
  • 141
3
votes
2 answers

Straight LXML or PyQuery

Does anyone have experience scraping with straight lxml vs. PyQuery. I just came across the latter recently and was intrigued. I haven't been able to find many comments about the library just yet, so I'm curious as to how robust it is. I'm…
Ben
  • 15,010
  • 11
  • 58
  • 90
3
votes
1 answer

Can't extract the result as expected when using requests_html

I can't extract the correct result with using requests_html: >>> from requests_html import HTMLSession >>> session = HTMLSession() >>> r = session.get('https://www.amazon.com/dp/B07569DYGN') >>>…
Lordran
  • 649
  • 8
  • 15
3
votes
1 answer

How to get element by text with pyquery?

I'm writting a spider, and I want to know which link is mean "next page",so I need to get the element by the value = "next page", and then get the link. It's not only include one tag, it's a whole html source code, and I want to get the specific…
Hanson
  • 99
  • 8
3
votes
1 answer

Find tag name of pyquery object

for l in d.items('nl,de,en'): if l.tag()=='nl': dothis() How can I find the tag associated with a pyquery object? The method tag() in the exaple above doesnt exist...
user104100
  • 41
  • 1
  • 6
3
votes
1 answer

AttributeError: 'XPathExpr' object has no attribute 'add_post_condition'

I'm trying to install pyquery on Windows and I get the following error when I try to do selects like this d('p:first'). Everything else seems to be working. Any idea what am I missing? This issue happens only on my windows machine, on my MAC works…
daniels
  • 18,416
  • 31
  • 103
  • 173
2
votes
1 answer

make_links_absolute() results in broken absolute URLs

I need to convert relative URLs from a HTML page to absolute ones. I'm using pyquery for parsing. For instance, this page http://govp.info/o-gorode/gorozhane has relative URLs in the source code, like
DemX86
  • 422
  • 1
  • 5
  • 10
2
votes
1 answer

Asynchronous request crawling using Python

I want to crawl the link: http://data.eastmoney.com/hsgt/index.html But I found the XHR documents are all without data, but EventSteam, so how can I crawl the complete information of the page. For example, I want to crawl -94.67亿元 on the page. my…
Wei Zhang
  • 47
  • 4
2
votes
2 answers

how to use pyquery to modify a node attribute in python

iwant use pyquery to do this. for example: html='
arya starkahahah
' a=PyQuery(html) i want to modify the html to
arya starkahahah
in other words, just need …
alwx
  • 179
  • 2
  • 9
2
votes
2 answers

How could I use PyQuery traversal correctly?

There is a file called "name.txt" Content is below
Michael
  • 31
  • 5
2
votes
1 answer

Fail to scrape images with pyspider and phantomjs

Now I wish to scrape the all the images of the items (iphone) in this web page. First I extract all the links of the image, and then send a request one by one to the src and download them to the folder '/phone/'. Here is my code: from…
u3728666
  • 99
  • 2
  • 9
2
votes
2 answers

PDFQuery: get Page number where element is located

This is the first time i use PDFQuery to scrape PDF's. What i need to do is to get the prices from a price list with several pages, i want to give the product code to PDFQuery, and it should find the code and return the price next to it. The problem…
aampudia
  • 1,581
  • 1
  • 11
  • 14
2
votes
1 answer

Stop pyquery inserting spaces where there aren't any in source HTML?

I am trying to get some text from an element, using pyquery 1.2. There are no spaces in the displayed text, but pyquery is inserting spaces. Here is my code: from pyquery import PyQuery as pq html = '

Richard
  • 62,943
  • 126
  • 334
  • 542

2
votes
1 answer

PyQuery get text node

I'm using PyQuery to process this HTML:
Personality: Strengths
Text

Personality: Weaknesses
Text

Now…
wong2
  • 34,358
  • 48
  • 134
  • 179