Questions tagged [lxml.html]

lxml.html is a dedicated python package for dealing with HTML.

lxml.html is a dedicated python package for dealing with HTML. It is based on lxml's HTML parser, but provides a special Element API for HTML elements, as well as a number of utilities for common HTML processing tasks.

159 questions

vote

1 answer

How to extract paragraph text in python using lxml from html file?

I am trying to extract the paragraph but getting []instead of the paragraph. How can I extract the paragraph? Selector_1 = "div.bloco-imovel-texto p" tree.cssselect(Selector_1)

python html lxml.html

asked Jan 31 '19 at 00:17

sargupta

vote

0 answers

cut XML tree at a specific depth

I have xmlfiles like this one:

This file…

python python-3.x depth xml.etree lxml.html

asked Nov 26 '18 at 11:49

dada

1,390
2
17
40

vote

1 answer

how to extract href from element using lxml cssselctor?

def extract_page_data(html): tree = lxml.html.fromstring(html) item_sel = CSSSelector('.my-item') text_sel = CSSSelector('.my-text-content') time_sel = CSSSelector('.time') author_sel = CSSSelector('.author-text') a_tag = CSSSelector('.a') for…

python-3.x beautifulsoup lxml lxml.html

asked Feb 27 '18 at 17:57

elrich bachman

vote

3 answers

KeyError in python saying KeyError : 'value'

I am trying to get the hidden elements in twitter login page. I followed a procedure which simply gets the hidden elements in that page. But the problem is when i try to get value of those elements, i am getting key error. the code is: import…

python python-requests lxml.html

asked Jan 25 '18 at 16:45

Akhil Reddy

vote

1 answer

Convert element to css selector in python

I'm trying to convert the following element: @[width="300"] That I convert to xpath as: //*[@width="300"] To a css selector. Because with lxml if I run: selector = "@[width="300"]" tree =…

python python-3.x xpath css-selectors lxml.html

asked Dec 03 '17 at 15:45

J0ker98

vote

1 answer

How to get data by selecting a value from a drop-down option without using selenium

I need to fetch all URLs from this page - http://www.questdiagnostics.com/testcenter/BUSearch.action?submitValue=BUSearch&keyword=Toxoplasma+Abs+IgG+%2F+IgM whenever I am selecting a value from a drop down and click on go button. I selected a value…

python-2.7 xpath web-scraping python-requests lxml.html

asked Nov 01 '17 at 16:08

Mounika K

vote

3 answers

Unable to remove spaces between scraped text

I've written a script in python to scrape some text out of some html elements. The script can parse it now. However, the problem is the results look weird with bunch of spaces between them. How can I fix it? Any help will be highly appreciated. This…

python python-3.x web-scraping lxml.html

asked Oct 18 '17 at 11:10

SIM

21,997
5
37
109

vote

1 answer

How to get concatenated child text nodes in lxml

This is the HTML sample:

First text part

xpath lxml lxml.html

asked May 08 '17 at 13:59

Andersson

51,635
17
77
129

vote

1 answer

can't get value inside tag in lxml

I am using lxml to scrape data from a website. The html code snippet is

html xpath web-scraping lxml lxml.html

asked Apr 20 '17 at 17:10

Aditya Shekhawat

vote

1 answer

Select and modify xpath nodes after specific text

I use this code to get all names: def parse_authors(self, root): author_nodes = root.xpath('//a[@class="booklink"][contains(@href,"/author/")]/text()') if author_nodes: return [unicode(author) for author in author_nodes] But i…

python xpath lxml lxml.html calibre

asked Dec 28 '16 at 23:44

wrangly

vote

1 answer

How check if element exist in lxml xpath?

I use lxml xpath for parsing HTML page in Python 3. As sample I have code, that finds element HTML: version_android = doc.xpath("//div[@itemprop='operatingSystems']//text()") Father I have insert Mysql query: insert = ("insert into tracks…

python python-3.x lxml lxml.html

asked Dec 21 '16 at 16:38

Huligan

vote

1 answer

lxml removes double slash iframe

I'm using lxml to sanitize html data, but in some cases lxml is removing also the valid tags. It removes iframe tags that have a valid host but starts with double slashes (//) code example: >>> cleaner =…

lxml sanitization html-sanitizing lxml.html

asked Nov 18 '16 at 21:49

user3164429

vote

1 answer

Why does lxml.html sometimes swallow/remove whitespace instead of preserving it?

Given the following code, one might reasonably expect almost the exact same string of HTML that was fed into lxml to be to spit back out. from lxml import html HTML_TEST_STRING =…

lxml libxml2 lxml.html

asked Mar 15 '16 at 06:23

naki

vote

1 answer

proper xpath to roll up text of children

I'm parsing a page that has structure like this:

content a

content b

# returns content a content b And I'm using the following XPath to get the content: "//pre[@class='asdf']/text()" It works well,…

python xpath lxml lxml.html

asked Jan 29 '16 at 05:30

tedder42

23,519
13
86
102

vote

1 answer

Python parsing html with lxml: get text of tag while specific sign causes problems

I'm parsing Real-World HTML files with lxml. This means, I want to extract information from tags and I don't have the control of the style. The problem I'm having lies within the data.

Notes …

python html lxml lxml.html

asked Nov 18 '15 at 17:48

IssnKissn

Prev 1 2 3

…

10 11 Next