Highest Voted 'boilerpipe' Questions

1

vote

0 answers

What is the best regular expression or other simple ways to extract an article content from a webpage in HTML or PHP source?

There are many scripts extracts articles from html pages. If using regular expression to get the only main article from html or PHP page source, what is the best regular expressions to get only the main article. Also, what is the simplest and the…

asked Mar 06 '15 at 11:13

john3825

11
2

1

vote

0 answers

Boilerpipe used in Android Causes Error: Conversion to Dalvik format failed

I put 'boilerpipe-1.2.0-android.jar'(https://code.google.com/p/boilerpipe/issues/detail?id=57), 'nekohtml-1.9.13.jar', 'xerces-2.9.1.jar' into libs folder of my Android Project. But it caused "Conversion to Dalvik format failed" Error. So, I did all…

java android compiler-errors boilerpipe

asked Mar 25 '14 at 02:16

user2731985

77
5

1

vote

1 answer

Using boilerpipe on Android application

I'm trying to use boilerpipe in an Android application. I have included the libraries boilerpipe-1.2.0, nekohtml-1.9.13, xerces-2.9.1 in the libs folder. When running the application with Eclipse i get the following error: Conversion to Dalvik…

java android eclipse boilerpipe

asked Feb 28 '14 at 18:53

Enry_h2o

31
1
1

1

vote

1 answer

How can i set "No JAVA_HOME Environment Variable set. Trying to guess it..."?

i am trying to install python library(boilerpipe): pip install boilerpipe. But i am getting the error: "No JAVA_HOME Environment Variable set. Trying to guess it..." which was i already set the java path. So what can i do for this????

python-3.x jupyter-notebook boilerpipe

asked Feb 24 '14 at 05:52

Prush

517
1
6
21

1

vote

0 answers

Custom output form boilerpipe - Transform
into two newlines

Im using boilerpipe to extract Text form Websites. ArticleSentencesExtractor.getInstance().getText(inputHTMLStream) I dont see any customization possibilities. I would like to separate

sentence

elements with two newlines. Is that possible -…

java boilerpipe

asked Oct 03 '13 at 17:46

LukeSolar

3,795
4
32
39

1

vote

2 answers

Using boilerpipe in Android

Boilerpipe is a library that basically extracts the main content from a webpage. For news websites, it is especially hard to extract the content as the formatting differs from site to site. So I've tried to integrate the boilerpipe library -…

java android textview stack-trace boilerpipe

asked Sep 30 '13 at 07:29

Auge

65
1
4
11

1

vote

0 answers

Determining type of a string

I'm looking for some way to determine the type of a string from any article website such as this one. Types would be title, author, date, article itself. I use BeautifulSoup and Boilerpipe to scrape the relevant content: from boilerpipe.extract…

python beautifulsoup boilerpipe

asked Mar 10 '13 at 04:06

Paul Chen

95
1
1
6

1

vote

2 answers

How can I get HTML output from NBoilerPipe?

NBoilerPipe is a Mono port of the BoilerPipe Java library. I've managed to get this working in .NET 4 without too much trouble (a few library references needed fixing/etc). However, searching through the code, I cannot find any 'hooks' for HTML…

.net html mono boilerpipe

asked Dec 11 '12 at 17:49

winwaed

7,645
6
36
81

1

vote

1 answer

Extract HTML article text with inline CSS

I want to extract text from crawled html web pages. I am using the excellent open source Boilerpipe library to do just that. However, with Boilerpipe I am getting only the raw text. In addition to the raw text, I need to capture the text with…

java extract boilerpipe

asked Jun 10 '12 at 02:40

cosmos

2,414
4
23
25

0

votes

2 answers

Trouble installing Boilerpipe

This is the third time I've installed it. I had it working on Windows, and up until a few days ago on Linux. I've done all I can do and I don't understand how to run this Java program. The source code is a folder with a lib, src some jars and a…

java classpath javac src boilerpipe

asked Oct 31 '11 at 03:28

user723220

817
3
12
20

0

votes

1 answer

Unsuported browser agent when crawling TripAdvisor with boilerpipe

I'm programming a generic webcrawler that gets the main content from a given webpage (it has to crawl different pages). I've tried to achieve this with different tools, among them: HtmlUnit: returned me too much scrap when crawling. Essence: failed…

java web-crawler htmlunit boilerpipe

asked Jan 14 '22 at 12:49

Santiago Luca

25
8

0

votes

0 answers

Exception when getting HTML from URL

I'm trying to get HTML from a URL so I can strip it down using Boilerpipe. However, I keep on getting an exception. I am using the NewsAPI to get my URLs. Here is the relevant code snippet: foreach (var article in articlesResponse.Articles) { …

c# boilerpipe

asked May 28 '20 at 14:56

esb5415

25
5

0

votes

0 answers

Tomcat Application throws java.lang.ClassNotFoundException for Jar in WEB-INF/lib

I'm trying to add Boilerpipe to do web scraping with my Tomcat project, but when I do so I tend to run into a problem. I add the jar as well as the necessary resources (nekohtml-1.9.13.jar and xerces-2.9.1.jar) to my Web-INF/lib folder and as an…

java tomcat boilerpipe

asked Apr 09 '20 at 06:57

bluesquare

76
1
8

0

votes

1 answer

What is the issue with this attempt to install boilerpipe3 for Python?

There are three venues (PCs or Severs) where I wish to install boilerpipe3 for Python. Each venue is running Windows 10, Python 3 and has almost the same environment set up in each. I have manged to install boilerpipe3 (via pip install) in two…

python-3.x setuptools python-wheel jpype boilerpipe

asked Jun 03 '19 at 10:46

agftrading

784
3
8
21

0

votes

2 answers

Why cannot I pip install a Python3 package?

I am new to Python (3) using Windows 10, 64. When trying to install a package, I get the long error message pasted below. What should I do? (base) C:\Users\xxx>pip install boilerpipe-py3 Collecting boilerpipe-py3 Using cached…

python python-3.x pip boilerpipe

asked May 22 '19 at 19:17

user1774127

Questions tagged [boilerpipe]