Questions tagged [boilerpipe]

The boilerpipe library for Java provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.

The boilerpipe library for Java provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.

77 questions
0
votes
1 answer

Ignore SSL verification for boilerpipe python wrapper web extractor?

I'm attempting to extract data from numerous sites that don't have SSL certifications. I'm using the boilerpipe python wrapper to extract the text without HTML and write it to a text file. I understand how to remove the SSL certification requirement…
0
votes
1 answer

Python 3 Unicode not found

I'm aware that unicode was changed to str in python 3 but I keep getting the same issue no matter how I write this code, can anyone tell me why? I'm using boilerpipe for a specific set of webcrawls: for urls in allUrls: fileW = open('article('+…
Adrian Coutsoftides
  • 1,203
  • 1
  • 16
  • 38
0
votes
2 answers

Can't read the same InputStream twice

This is my code: // getFile() method returns the input stream of a local or online file InputStream fileStream = getFile(source); // Convert an InputStream to an InputSource org.xml.sax.InputSource fileSource = new…
Salvatore
  • 499
  • 10
  • 16
0
votes
1 answer

using boilerpipe with pyspark

I am using boilerpipe to get text out of html. However there is some issue that I have not been able to resolve. I have a list of 50k elements. I am creating an rdd of 1000 elements and then processing them and saving the resultant rdd in hdfs. The…
Ravi Ranjan
  • 353
  • 1
  • 6
  • 22
0
votes
1 answer

Gem install not finding existing gem

When running gem install I get the following: gregoryostermayr@gregors test $ gem install jruby-boilerpipe ERROR: Could not find a valid gem 'jruby-boilerpipe' (>= 0) in any repository ERROR: Possible alternatives: boilerpipe, jruby-coercion,…
Gregory Ostermayr
  • 1,123
  • 10
  • 17
0
votes
1 answer

Android Studio: java.lang.NoClassDefFoundError from boilerpipe

I am trying to use boilerpipe for getting text article like the pocket app. App compile properly but gives runtime exception: java.lang.RuntimeException: An error occurred while executing doInBackground() at…
user3125971
  • 139
  • 3
  • 15
0
votes
1 answer

Android Studio: Build error after adding boilerpipe library

I am trying to use [boilerpipe][1] for parsing text. I copied boilerpipe-1.2.0.jar, nekohtml-1.9.13.jar and xerces-2.9.1.jar to lib folder and added them as library. But when i try to run the project i get a huge error. Here is its end…
0
votes
1 answer

Extract article's headline from HTML(using Boilerpipe)

Boilerpipe allows to extract just the article's text from webpage, cleaning up all the HTML mess. However, how could I extract article's headline? There is a a way to just use page's title, but it is sometimes incorrect and contains unneeded…
Gintas_
  • 4,940
  • 12
  • 44
  • 87
0
votes
2 answers

Python boilerpipe installation issue

I am trying to insatll Python Boilerpipe in my Ubuntu 14. It fails with the following error: Traceback (most recent call last): File "setup.py", line 27, in download_jars(datapath=DATAPATH) File "setup.py", line 21, in…
najeeb
  • 813
  • 12
  • 25
0
votes
2 answers

Boilerpipe dependency not found

According to https://github.com/Netbreeze-GmbH/boilerpipe the maven dependency for boiler pipe is de.l3s.boilerpipe boilerpipe-core 1.2.2 But this…
blue-sky
  • 51,962
  • 152
  • 427
  • 752
0
votes
0 answers

Error in instantiating servlet class in intelliJ idea

i'm using intellij for developing my web application using boilerpipe jar to create an application like this,but i'm getting error in instantiating servlet class .. this is my servlet code import…
ashif-ismail
  • 1,037
  • 17
  • 34
0
votes
0 answers

Boilerpipe import error urllib2

I successfully installed JPype and Boilerpipe Python wrapper. My JAVA_HOME path is correct (as far as I know). I created a python file with the following code: from boilerpipe.extract import Extractor extractor =…
Daniel
  • 1
  • 2
0
votes
1 answer

pip install boilerpipe failed with tarfile.ReadError: empty file

I'm try to install boilerpipe through pip but it failed. here is the log. Complete output from command python setup.py egg_info: Traceback (most recent call last): File "", line 20, in File…
0
votes
0 answers

Boilerpipe import error in python

I have successfully installed Boilerpipe and Jpype in python but getting error while Importing boilerpipe >>> import boilerpipe Traceback (most recent call last): File "", line 1, in import boilerpipe File…
Jayant Jaiswal
  • 171
  • 1
  • 8
0
votes
1 answer

How to solve ConnectException error when using Boilerpipe?

I want to use Boilerpipe to extract text from a newspage on several website, the problem is that every time I try it, I get a ConnectionException error. I just used the example syntax from the boilerpipe quickstart guide : URL url = new…
Malik
  • 207
  • 1
  • 2
  • 14