Questions tagged [lxml.html]

lxml.html is a dedicated python package for dealing with HTML.

lxml.html is a dedicated python package for dealing with HTML. It is based on lxml's HTML parser, but provides a special Element API for HTML elements, as well as a number of utilities for common HTML processing tasks.

159 questions
1
vote
1 answer

Can I access the subchild of a parent in XPath?

So as the title states I have some HTML code from http://chem.sis.nlm.nih.gov/chemidplus/name/acetone that I am parsing and want to extract some data like the Acetone under MeSH Heading from my similar post How to set up XPath query for HTML…
TimTom
  • 97
  • 3
  • 12
1
vote
1 answer

Scraping new ESPN site using xpath [Python]

I am trying to scrape the new ESPN NBA scoreboard. Here is a simple script which should return the start times for all games on 4/5/15: import requests import lxml.html from lxml.cssselect import CSSSelector doc = …
jdesilvio
  • 1,794
  • 4
  • 22
  • 38
1
vote
1 answer

Parsing xpath with python

I'm trying to parse a web page that contains this:
foosion
  • 7,619
  • 25
  • 65
  • 102
1
vote
2 answers

Get value using lxml

I have the following html:

Aspect Ratio:

2.35 : 1
I want to get the value "2.35 : 1" from the content. However, when I try using lxml, it returns an empty string (I am able to get the 'Aspect…
David542
  • 104,438
  • 178
  • 489
  • 842
1
vote
1 answer

Issue with parsing html with lxml by xpath

I am trying to parse data from a google interactive website. It is rendered in JS, so I use Qt to load the site to parse from. I believe I have the site loaded and rendered properly, but for some reason I am getting and empty list returned to me…
metersk
  • 11,803
  • 21
  • 63
  • 100
1
vote
1 answer

how to use lxml find all the src tags and replace them

I want to use lxml to got src content and replace them with space. But the body still not be replaced Please help me Thank you. import re import lxml.html #the content of source.log is a webpage source code I got by scrapy with open("source.log",…
user2492364
  • 6,543
  • 22
  • 77
  • 147
1
vote
2 answers

Search for special HTML characters in text of lxml.html elements

Given an (un)ordered list I have to check if special HTML arrows are being used (and replace them with Latex arrows). lxml.html is a requirement. I was tinkering around but then I couldn't get past the following: import lxml.html my_string = "
  • I…
  • yang5
    • 1,125
    • 11
    • 16
    1
    vote
    1 answer

    Difference between a/img/..//text() and a//text()

    I'm working with Scrapy and lxml trees to sort out html trees. I noticed that there is difference between these two xpath expressions. I was under the impression that they were interchangeable. Could someone please explain me the…
    NST
    • 724
    • 9
    • 20
    1
    vote
    1 answer

    Parsing forum posts using lxml/python

    When I use the code below, it splits one div into fifteen items in the array. The thing is that I want this one post as one item in the array. It probably happens because of
    tags, but I am not sure how to solve it. from lxml import html import…
    Simon
    • 13
    • 4
    1
    vote
    1 answer

    How to find text in specific tag wih lxml and python?

    Assuming html source are as follows: some other content here
    this is another one title

    text paragraph 1 here

    text paragraph 2 here

    text paragraph n here

    1
    vote
    2 answers

    Extract text() and get attributes from it

    I get an html tag with xpath, with conditions, and now i get the value with text(). Is there any way to get attributes from this value? (text()) Value from text() document.write("hello"); Now i'll get the whole line…
    user3507915
    • 279
    • 3
    • 15
    1
    vote
    2 answers

    Python: Scrape Data from Web after Inputing Info

    Could anyone help me revise this Python program to correctly submit information to the "Date Range" query, and then extract the "Close" return data. I am scraping data from the following…
    The Novice
    • 124
    • 9
    1
    vote
    1 answer

    Python and lxml.html get_element_by_id output questions

    I'm currently trying to get data from an html file. It appears that the code I'm using works, but not as I expect. I can get some items but not all and I'm wondering if it has to do with the size of the file I'm attempting to read. I'm currently…
    pri0ritize
    • 554
    • 2
    • 7
    • 19
    1
    vote
    1 answer

    Parsing Yelp using lxml - ignore html tag

    I am trying to run the below code bit to extract Yelp review from lxml import html import requests import csv page = requests.get('http://www.yelp.com/biz/guisados-los-angeles') review = tree.xpath('//p[@itemprop="description"]/text()') Now,…
    Arun
    • 39
    • 9