Questions tagged [html-content-extraction]

Techniques for predicting/detecting certain article text and extracting it from a particular document.

Techniques for predicting/detecting certain article text and extracting it from a particular document. Also referred to as web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites. Usually, such software programs simulate human exploration of the World Wide Web by either implementing low-level Hypertext Transfer Protocol (HTTP), or embedding a fully-fledged web browser, such as Internet Explorer or Mozilla Firefox.

211 questions

votes

3 answers

How to extract values from HTML using RegEx?

Given the following HTML:

OAK RIDGE, N.J., March 16, 2011 /PRNewswire/ -- Lakeland Bancorp, Inc. (Nasdaq:

regex html-content-extraction text-extraction

asked Mar 16 '11 at 15:22

Paul Fryer

9,268
14
61
93

votes

1 answer

How to get the links from all the embedded videos on a webpage?

Let me explain. What I'm trying to do is, given a certain webpage I want to get the count of how many embedded videos and their links. I'm not asking for the code itself, but some pieces of information on how to achieve that.

javascript object embed html-content-extraction

asked Jan 24 '11 at 18:12

Gustavo

votes

1 answer

Generic Article Extraction from web pages

Am going to begin my work in article extraction. The task that I will be doing is to extract the hotel reviews that is posted in different web pages(eg. 1.…

java extract html-content-extraction

asked Nov 11 '10 at 08:58

LGAP

2,365
17
50
71

votes

2 answers

XQuery extract between two tags

I am currently working on extracting data from HTML. I would like to extract the text between two

tags.

XYZ:

asdfghjk

sdsdsd

…

xml xquery html-content-extraction

asked Jun 25 '10 at 13:16

Technocrat

votes

1 answer

Extraction of main content of an article (JavaScript)

I'm writing a program that reads a general HTML "article" page (Wikipedia, NY Times, Yahoo News, ect). From that page I want to strip away all of the "noise" (ads, header bars.. anything that isn't part of the article content.) To think about it…

javascript algorithm extract html-content-extraction

asked May 29 '15 at 03:24

Damon Williams

votes

1 answer

parsing HTML in swift

Can anyone help me out with this one: I have a HTTP page formatted this way:

Person0
- Person1
  html ios parsing swift html-content-extraction
  asked Dec 14 '14 at 14:35
  Clemen Gronver
  
  181
  
  3
  
  12

votes

0 answers

Loop for Extracting Detailed HTML tables from multiple webpages into Excel

I would like to extract info from each page on the http://www.adac.de/infotestrat/autodatenbank/suchergebnis.aspx when I go into details for each auto (after clicking "Suchen" (eng. Search)). E.g. first row…

javascript vba extract imacros html-content-extraction

asked May 20 '14 at 12:47

Vlada Pleshcheva

votes

3 answers

php : parse html : extract script tags from body and inject before ?

I don't care what the library is, but I need a way to extract <.script.> elements from the <.body.> of a page (as string). I then want to insert the extracted <.script.>s just before <./body.>. Ideally, I'd like to extract the <.script.>s into 2…

php dom html-content-extraction

asked May 02 '14 at 13:20

theclueless1

votes

2 answers

Any ideas about the jQuery equivalent of the READABILITY code? (Or: building the best heuristic to find the main text using jQuery)

http://lab.arc90.com/experiments/readability/ is a very handy tool for viewing cluttered newspaper, journal and blog pages in a very readable manner. It does this by using some heuristcis and finding the relevant main text of a web page. Its source…

jquery html-content-extraction heuristics

asked Dec 22 '09 at 15:45

Emre Sevinç

8,211
14
64
105

votes

3 answers

Reading source code from a webpage in java

I am trying to read source code from a webpage. My java code is import java.net.*; import java.io.*; import java.util.*; import javax.swing.JOptionPane; class Testing{ public static void Connect() throws Exception{ URL url = new…

java html-content-extraction

asked Oct 10 '13 at 10:25

Ahmad Ali

votes

1 answer

extract information from a website using Qt?

I'd like extract the information in the b tag => 123456789 this is the HTML source :

…

c++ html qt html-content-extraction

asked Sep 08 '13 at 20:07

NPLS

votes

4 answers

How to extract data from a raw HTML file?

Is there a way to extract desired data from a raw html which has been written unsemantically with no IDs and classes? I mean, suppose there is a saved html file of a webpage (profile) and I want to extract the data like (say) 'hobbies'. Is it…

php html parsing html-content-extraction

asked Nov 30 '09 at 17:13

apnerve

4,740
5
29
45

votes

5 answers

PHP - how to get main HTML content like Reader Mode in Firefox

in android Firefox app and safari iPad we can read only main content by "Reader Mode". read more... How to recognize only main content in HTML with PHP? I need to detect main news like Firefox or safari by php for example I get news from…

php file-get-contents html-content-extraction

asked Jul 18 '13 at 20:29

Milad Ghiravani

1,625
23
43

votes

3 answers

Scraping from wsj.com or finance.yahoo.com

I want to display on a wordpress page the total volume of shares traded on the NYSE stock exchange the last 2 weeks that it's been open. What is the best way to go about doing this?

php screen-scraping html-content-extraction

asked Nov 14 '09 at 00:04

pg.

2,503
4
42
67

votes

3 answers

How to programmatically extract information from a web page, using Linux command line?

I need to extract the exchange rate of USD to another currency (say, EUR) for a long list of historical dates. The www.xe.com website gives the historical lookup tool, and using a detailed URL, one can get the rate table for a specific date, w/o…

html linux extract html-content-extraction

asked Feb 27 '13 at 06:06

ysap

7,723
7
59
122

Prev 1 2 3

…

14 15 Next