Questions tagged [html-content-extraction]

Techniques for predicting/detecting certain article text and extracting it from a particular document.

Techniques for predicting/detecting certain article text and extracting it from a particular document. Also referred to as web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites. Usually, such software programs simulate human exploration of the World Wide Web by either implementing low-level Hypertext Transfer Protocol (HTTP), or embedding a fully-fledged web browser, such as Internet Explorer or Mozilla Firefox.

211 questions

votes

4 answers

How can I extract HTML content efficiently with Perl?

I am writing a crawler in Perl, which has to extract contents of web pages that reside on the same server. I am currently using the HTML::Extract module to do the job, but I found the module a bit slow, so I looked into its source code and found out…

html perl html-content-extraction

asked Sep 11 '09 at 08:53

Alvin

10,308
8
37
49

vote

1 answer

iOS - Converting HTML to Normal text

In my application, I'm receiving an html file from a news server. After receiving, I want to remove the tags, images, URL anchors, etc and just show the text in text view. There's a website which functions similar to the one that I'm looking for.…

html ios text html-content-extraction

asked Jan 19 '12 at 13:35

Satyam

15,493
31
131
244

vote

1 answer

How to take data/text off a website and input it in my android app via text view, list view etc?

im trying to create an application where a page has its text created automatically by reading off a website? I understand an array would be used and string formatting, i am ok with android programming but not an expert lol. I had tried using a set…

android data-binding html-content-extraction textreader

asked Dec 19 '11 at 16:08

Nazul Khan

vote

5 answers

How to collect data from a website

Preface: I have a broad, college knowledge, of a handful of languages (C++, VB,C#,Java, many web languages), so go with which ever you like. I want to make an android app that compares numbers, but in order to do that I need a database. I'm a one…

database web html-content-extraction

asked Dec 18 '11 at 04:54

Mr. MonoChrome

1,383
3
17
39

vote

1 answer

Non mshtml c# parsing html and javascript

I'm looking for a way to parse a html document with javascript embedded. I know that this can be done with MSHTML and code DOM, but in this case it is not an option. I need the program to be also able to run on Mono. Any suggestions?

c# javascript html parsing html-content-extraction

asked Aug 17 '11 at 10:52

Arsen Zahray

24,367
48
131
224

vote

1 answer

.NET 5 HttpClient cannot GET html content page - http 500

I'm trying to use HttpClient to get html content of a page. to try the method I tested with the google URL, and it's working, I receive the content of my html page. but with url I want, impossible to get a content. I have each time a return code…

c# httpclient .net-5 html-content-extraction

asked Aug 12 '21 at 10:55

Kujima

vote

1 answer

Extract all Images from HTML whose width or height higher than a specified value - Regex

I'm trying to make a small link share function with Classic ASP like LinkedIn or Facebook. What I need to do is to get HTML of remote URL and extract all the images whose width are greater than 50px for example. I can crawl and take the HTML and…

regex asp-classic html-content-extraction

asked Jun 13 '11 at 22:58

Burak F. Kilicaslan

vote

2 answers

To bypass referral check

Is there any way to bypass the referral check applied by some site in order to avoid there data from being extracted. Like if you follow this link! You will get Access Denied Error. However , if you just go this link!, it takes you to home page and…

html-content-extraction referrals

asked Jun 02 '11 at 21:17

Prashant Singh

3,725
12
62
106

vote

1 answer

Extract file from http response in Azure logic app

I have an Azure function (http triggered) which returns a CSV file in response. I am calling this function from a logic app using http request action (since I need to pass authentication details) and getting the http response with the CSV in body.…

azure azure-functions httpresponse azure-logic-apps html-content-extraction

asked Mar 19 '19 at 11:14

sbanik

vote

3 answers

Screen-scraping for PDF links to download

I'm learning C# through creating a small program, and couldn't find a similar post (apologies if this answer is posted somewhere else). How might I go about screen-scraping a website for links to PDFs (which I can then download to a specified…

c# pdf screen-scraping html-parsing html-content-extraction

asked Mar 11 '11 at 22:58

superwillis

vote

4 answers

extract the main part of a page in java

Hello I have a page of a personality in wikipedia and I want to extract with java source a code HTML from the main part is that. Do you have any ideas?

java html html-content-extraction

asked Mar 09 '11 at 18:38

user651584

vote

3 answers

How to get the value of a row extracted using jQuery

I have a table and I'm retrieving each table row by doing this: $(function(){ $('table tr').click(function(){ var $row = $(this).html(); alert($row); }); }); This gets me the current row like…

jquery html-table html-content-extraction

asked Feb 21 '11 at 17:28

Tsundoku

9,104
29
93
127

vote

1 answer

HTML XPath: Extracting text mixed in with multiple level and complex tags?

related questions before: HTML XPath: Extracting text mixed in with multiple tags? HTML XPath: Selectively avoiding tags when extracting text //sorry for my poor English I'm a beginner of writing web crawler, I'm trying to extract main content from…

html xpath scrapy html-content-extraction

asked Mar 01 '17 at 02:41

Poplar Giant

vote

1 answer

reading web page source code in java Differs from the orginal webpage source code

I am trying to implement program to read webpage source code and save it in text file then do some operations in it but the problem when I read web page source code , there is difference between the orginal web page source code and the output of…

java html html-content-extraction web-content

asked Jan 19 '17 at 10:34

Oghli

2,200
1
15
37

vote

1 answer

Best visible content extractor available

So my application needs visible content from a given URL, like just the text part, no html no header or footer data. As of now I am using beautifulsoup and boilerpipe for getting the same. But in some rare cases I am not getting enough data or the…

web-scraping web-crawler screen-scraping html-content-extraction

asked Jan 02 '17 at 10:12

najeeb

Prev 1 2 3

…

14 15 Next