Highest Voted 'lxml.html' Questions

2

votes

1 answer

Scraping IMDb Review Page with lxml and requests package

I want to extract the user reviews of a particular movie with help of lxml. Before that, I need to find out the number of reviews first. An example review page is Interstellar I found the XPath where User Reviews are found with the help of Firebug:…

python lxml lxml.html

asked Mar 05 '15 at 08:46

GokuShanth

203
3
12

2

votes

3 answers

Output of lxml in Python 2.7

This might be a completely foolish question, but google is to no avail. First of course importing the libraries I need: from lxml import html from lxml import etree import requests Simple enough. Now to run and parse some code. The link in this…

python python-2.7 lxml lxml.html

asked Jan 09 '15 at 02:38

Ruhpun

25
3

2

votes

3 answers

Python - Requests: Correctly Using Params?

Before I begin, may I just say, I am very new to general communication with the web in code. With that said, could anyone assist me in getting these parameters, 'a': stMonth, 'b': stDate, 'c': stYear, 'd': enMonth, …

python html request lxml lxml.html

asked Dec 30 '14 at 08:15

The Novice

124
9

2

votes

1 answer

Getting parent tag id with lxml

I am trying scrape a dummy site and get the parent tag of one that I am searching for. Heres the structure of the code I am searching for:

Heres my python…

python xpath web-scraping lxml lxml.html

asked Dec 16 '14 at 20:48

user2157179

238
2
4
19

2

votes

1 answer

Duplicates when extracting data from html table using lxmk.html.xpath()

I am trying to extract data from this table at Espn cricinfo. Each row is comprised of the folowing format (Data replaced by headers): Player Name (Country) Score …

python xpath lxml.html

asked Aug 19 '14 at 20:58

Padraig Scott

43
3

2

votes

1 answer

Removing img tag in lxml

I have this code: from lxml.html import fromstring, tostring html = "

Here is some text

" doc = fromstring(html) img = doc.find('.//img') doc.remove(img) print tostring(doc) And the output is:

Why does…

python html html-parsing lxml lxml.html

asked Jul 10 '14 at 02:48

rmacqueen

971
2
8
22

2

votes

2 answers

Traversing back to parent with lxml.html.xpath

How can we traverse back to parent in xpath? I am crawling IMDB, to obtain genre of films, I am using elem = hxs.xpath('//*[@id="titleStoryLine"]/div/h4[text()="Genres:"]') Now,the genres are listed as anchor links, which are siblings to this tag.…

python lxml lxml.html

asked Mar 04 '14 at 16:45

Amrith Krishna

2,768
3
31
65

2

votes

1 answer

python - parse html form with lxml.html with xpath syntax

Here is the form. The same exact form appears twice in the source.

2

votes

1 answer

Python - lxml library 'clean' method erasing only half of empty
node

I'm using the lxml library in Python to clean html pages from potentially harmful code/parts I don't want. I noticed a strange behavior in the function: when given an empty

node, it removes the closing

tag but not the opening one. For…

python lxml html-sanitizing lxml.html

asked May 24 '13 at 12:57

Robin

9,415
3
34
45

1

vote

1 answer

Best XPath practices for extracting data from a field that varies in format

I was using Python 3.8, XPath and Scrapy where things just seemed to work. I took my XPath expressions for granted. Now I'm must using Python 3.8, XPath and lxml.html and things are much less forgiving. For example, using this URL and this…

python xpath lxml.html

asked Apr 30 '21 at 00:31

spacedog

446
3
13

1

vote

2 answers

Why does python requests.get() retrieve different image src compared to browsing the site

As the title suggest: calling the requests.get() method gives me a different image src link as opposed to when browsing the site manually. I'm trying to scrape a site for products and want to store the images but the src I get from the site is for a…

html python-3.x python-requests src lxml.html

asked Feb 03 '21 at 08:35

Marco Fernandes

326
1
4
13

1

vote

1 answer

How to get text from HTML element by using lxml.html

I've been trying to get a full text hosted inside a

element from the web page https://www.list-org.com/company/11665809. The element should contain a sub-string "Арбитраж". And it does, because my code for div in…

python html lxml lxml.html

asked May 10 '20 at 09:33

Sergey Solod

695
7
15

1

vote

2 answers

Scraping a nested and unstructured table in python (lxml)

The website I'm scraping (using lxml ) is working just fine with everything except a table, in which all the tr's , td's and heading th's are nested & mixed and forms a unstructured HTML table.

Serial No. …

python web-scraping lxml lxml.html

asked Sep 10 '19 at 04:03

Mukul Kumar Jha

1,062
7
19

1

vote

2 answers

Python scraping's trouble in extract value

I'm trying to extract values from the table in this site: https://www.geonames.org/search.html?q=&country=IT In my example I want to extract the name 'Rome' and I used this code: import requests import lxml.html html =…

python-3.x xpath python-requests lxml.html

asked May 05 '19 at 21:41

gergiu

11
1

1

vote

1 answer

Compare string result from path & requests

I am scraping the HTML code from the URL defined, mainly focussing on the tag, to extract the results of it. Then, compare if string "example" exists in the script, if yes, print something and flag =1. I am not able to compare the results extracted…

python web-scraping tree lxml lxml.html

asked Feb 12 '19 at 05:42

tehais

33
6

Prev 1 2 3

…

10 11 Next

Questions tagged [lxml.html]