Questions tagged [lxml.html]

lxml.html is a dedicated python package for dealing with HTML.

lxml.html is a dedicated python package for dealing with HTML. It is based on lxml's HTML parser, but provides a special Element API for HTML elements, as well as a number of utilities for common HTML processing tasks.

159 questions

votes

0 answers

Python, robobrowser, answer authentication-challenge after login

I'm really new to python programming. I'm working on automation of a web-browser. I started with selenium, but found it to be really slow for what I need. I'm working on a code that can Login to a webpage and fill out few text-boxes and click on…

asked Oct 26 '17 at 06:19

shiny

votes

1 answer

Python: lxml xpath to extract content

Below code able to extract PE from the reuters link below. However, my method is not robust as the webpage for another stock has two lines lesser and result a shift of data. How can I encounter this problem. I would like to point straight to the…

python-2.7 lxml lxml.html

asked Sep 07 '16 at 14:12

vindex

votes

1 answer

Python lxml iterating through tr elements

I'm running into an issue when trying to get the parent node of a tr element whilst iterating through them all. Here's a basic table that I'm working with.

Some text

…

python python-3.x lxml lxml.html

asked Jul 09 '16 at 20:36

Chad

votes

2 answers

How to grab raw all raw html within a certain XPath from a local file in Python

I am trying to grab the raw html from a bunch of local html files. I had some help from this post in getting the raw file to read in: Get all text inside a tag lxml But the code I have currently produces the entire file instead of a subset. Right…

python lxml lxml.html

asked Jul 06 '16 at 19:49

Paul Loach

votes

2 answers

Attempting to get the text from a certain part of a website using lxml.html

I have some current Python code that is supposed to get the HTML from a certain part of a website, using the xpath of where the HTML tag is located. def wordorigins(word): pageopen =…

python html lxml lxml.html

asked May 06 '16 at 05:28

eccentricayman

votes

1 answer

HTML parsing with lxml, python, .tail being broken up by
tags

I have a website that I am trying to scrape (while not really understanding html) but I have done a ton of reading and made some progress. It's a messy site but the important part looks like this:

DESCRIPTOR1: " important…

python html-parsing lxml lxml.html

asked May 03 '16 at 02:17

Sardar Monfils

votes

1 answer

HTML parsing with lxml - how to keep empty content in resulting list?

I am using lxml to parse an html file: from lxml import html tree = html.parse(myfile) data = tree.xpath('//p/text()') I have 300

text

tags in my html file, but len(data) is only 250 because sometimes I'll have

in my html. I want…

python html parsing lxml lxml.html

asked Feb 05 '16 at 16:36

user1566200

1,826
4
27
47

votes

2 answers

This xPath is giving no results, any reason why?

import requests from lxml import html page = requests.get(url="http://www.cia.gov/library/publications/the-world-factbook/geos/ch.html") tree = html.fromstring(page.content) bordering =…

python xpath web-scraping python-requests lxml.html

asked Jan 10 '16 at 19:14

Parkerjdude

votes

2 answers

Python LXML.HMTL Xpath Return Empty List

Problem: The date_list is an empty list. Should not be empty because list length should equal list length of oct and filing_type_list. What I have done: searched for typos. tried different companies (example is of REXAHN PHARMACEUTICALS,…

python xml xpath lxml lxml.html

asked Dec 05 '15 at 06:37

SAH

votes

0 answers

Python lxml xpath gives different results on two different unix distros

When I run this xpath expression //tr[42]/td//span/./following-sibling::a[1]/@href on two different systems, I get two different results. On Ubuntu 14.04.2 LTS i get ["javascript:__doPostBack('datagrid_results$_ctl44$_ctl1','')"] On rehel fedora…

python ubuntu lxml fedora lxml.html

asked Nov 18 '15 at 20:55

Fuchida

votes

1 answer

Web scraping a text() in python

I am having trouble with a web scraping function. The XPath for the two things I am trying to get are /html/body/div/table[2]/tbody/tr[5]/td[1]/div[1]/ul/li[1]/text() /html/body/div/table[2]/tbody/tr[5]/td[1]/div[1]/ul/li[1]/a The html is

python html xpath web-scraping lxml.html

asked Sep 18 '15 at 14:42

lost

votes

1 answer

How to parse a htmlpage with lxml with
screwing up?

I want to parse the following piece of html from Nasa's website with lxml in python:

Launch Date:1981-09-24
Launch Vehicle: Delta
Launch Site: Cape…

python html html-parsing lxml lxml.html

asked Jul 20 '15 at 12:51

Frank

votes

1 answer

lxml.html ignoring body class attributes

I am using lxml.html for parsing html content. But I don't understand why lxml is dropping "body" tag attributes. Tried using both lxml.html.parse and lxml.html.document_fromstring as suggested here But still it is not working. Example html…

iframe html-parsing lxml lxml.html

asked May 09 '15 at 18:09

Karan

votes

3 answers

How to get textarea value with lxml python

With this python code i can get whole html source import mechanize import lxml.html import StringIO br = mechanize.Browser() br.set_handle_robots(False) br.addheaders = [("User-agent","Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13)…

python lxml lxml.html

asked Apr 08 '15 at 10:24

Dark Cyber

2,181
7
44
68

votes

1 answer

Using lxml to Validate HTML

I am trying to use lxml to validate a piece of HTML but it complains that the fragment is invalid even though it should be valid: img = """""" parser =…

html validation lxml lxml.html

asked Mar 27 '15 at 03:57

Alex Rothberg

10,243
13
60
120

Prev 1 2

…

10 11 Next