Questions tagged [html-content-extraction]

Techniques for predicting/detecting certain article text and extracting it from a particular document.

Techniques for predicting/detecting certain article text and extracting it from a particular document. Also referred to as web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites. Usually, such software programs simulate human exploration of the World Wide Web by either implementing low-level Hypertext Transfer Protocol (HTTP), or embedding a fully-fledged web browser, such as Internet Explorer or Mozilla Firefox.

211 questions
1
vote
1 answer

extracting links with a specific class with Selenium in Python

I am trying to extract links from a infinite scroll website It's my code for scrolling down the page driver = webdriver.Chrome('C:\\Program Files …
1
vote
1 answer

Inserting value from vba into html list

I have been able to manipulate data in a list using vba's .value method. But when trying to update a particular value it fills the value in as blank and won't let me change it. I'm trying to push a date into this field but the date is taken from a…
1
vote
2 answers

Django-haystack search static content

My Django 1.10 app provides a search functionality using Haystack + Elastic Search. It works great for models data, but I need to make it work for static content too (basically HTML files). I was thinking on scrapping the content from the HTML…
1
vote
2 answers

Map RSS entries to HTML body w. non-exact search

How would you solve this problem? You're scraping HTML of blogs. Some of the HTML of a blog is blog posts, some of it is formatting, sidebars, etc. You want to be able to tell what text in the HTML belongs to which post (i.e. a permalink) if any. I…
ʞɔıu
  • 47,148
  • 35
  • 106
  • 149
1
vote
1 answer

Extract Data from HTML using PHP

Here is what I am looking for : I have a Link which displays some data on HTML format : http://www.118.com/people-search.mvc...0&pageNumber=1 Data comes in below format :
Bird John 56 Leathwaite Road London…
David
  • 11
  • 1
  • 1
  • 2
1
vote
6 answers

jQuery: getting/parsing content from different sites

I'd like to do the following: grab news from several sites, parse their content using jQuery selectors and show them on one page. How could this be done with jQuery? Thanks.
Fuxi
  • 7,611
  • 25
  • 93
  • 139
1
vote
1 answer

How to navigate a website and extract data with Python

I am not much of a programmer. Just learning. I want to extract (public) electoral data from my country's electoral Authority using Python. This is for academic purposes but I also want to develop my programming skills. All of the data I store will…
1
vote
1 answer

Identify if html contains template language code while extracting from URL

I am trying to fetch plain text content from any provided URL which has some text data. While testing the feature on one of the URLs I found some template language code present in the source. {{if user.username || user.id}} …
SanketR
  • 1,182
  • 14
  • 35
1
vote
1 answer

php warning: illegal string offset ‘name’ in GetUrl.php on line 855

I got error from my Apache log: php warning: illegal string offset ‘name’ in GetUrl.php on line 855 Here is the page code: function find_header_by_name($header_name) { if (!$this - > headers_received) { $this - > GetUri - > errors[] =…
1
vote
1 answer

how to extract html code for website using iframe and silverlight

I need to load a specific webpage from a site that has multiple images on the site. I need to extract these images but I can't do this manually because the names of each image have no pattern and there will be hundreds of sites. I have a silverlight…
randomalbumtitle
  • 151
  • 3
  • 15
1
vote
2 answers

Extracting Links in Perl using TreeBuilder

I'm working on a script to extract a bunch of information into one HTML file. I'm having some difficulty extracting ONLY a specific set of links from the page in question, however. Here is a rough structure of the site. There are some other headings…
1
vote
1 answer

Get element content from a variable containing html

How do I use the DOM parser to extract the content of a html element in a variable. More exactly: I have a form where user inputs html in a text area. I want to extract the content of the first paragraph. I know there are many tutorials on this,…
John
  • 404
  • 4
  • 12
1
vote
0 answers

What is the best regular expression or other simple ways to extract an article content from a webpage in HTML or PHP source?

There are many scripts extracts articles from html pages. If using regular expression to get the only main article from html or PHP page source, what is the best regular expressions to get only the main article. Also, what is the simplest and the…
john3825
  • 11
  • 2
1
vote
2 answers

Extract specific part of URL from string

I need to extract only parts of a URL with PHP but I am struggling to the set point where the extraction should stop. I used a regex to extract the entire URL from a longer string like this: $regex =…
Charles Ingalls
  • 4,521
  • 5
  • 25
  • 33
1
vote
3 answers

Unable to show Json html content data in textview in android

Right now i am trying to display images and texts from one html content in text-view in android. Actually i am getting those html contents from json,but the help of below code i can only able to show the available texts like the below image and…
Manick
  • 817
  • 2
  • 15
  • 24