Questions tagged [apache-spark-mllib]

MLlib is a machine learning library for Apache Spark

MLlib is a low level, RDD based machine learning library for Apache Spark

External links:

Related tags:

,

2241 questions
1
vote
2 answers

Sparkling water: Can't make use of the support of spark ml pipelines

According to this blog by the Sparkling water guys, you are now able to use the Spark ML pipelines components to build a DL model in the latest versions. I tried adding the latest versions in my build.sbt "org.apache.spark" % "spark-mllib_2.10" %…
void
  • 2,403
  • 6
  • 28
  • 53
1
vote
1 answer

Spark IDFModel on numbers

I'd like to execute TF-IDF model on a data where the "document" contents are numeric identifiers (instead of text). So I don't want to hash them, just use the numeric values instead. Any simple way to produce the…
kecso
  • 2,387
  • 2
  • 18
  • 29
1
vote
2 answers

Distributed Word2Vec Model Training using Apache Spark 2.0.0 and mllib

I have been experimenting with spark and mllib to train a word2vec model but I don't seem to be getting the performance benefits of distributed machine learning on large datasets. My understanding is that if I have w workers, then, if I create an…
Kabutops
  • 111
  • 6
1
vote
1 answer

Spark Clustering: How to get a similarity measure of the elements within the same cluster?

I have clustered some data using Spark and now I want to get a similarity score between a specific entry I am interested in and the other elements in the same cluster my entry is in. Are there any Spark algorithms or methods for this? I've read of…
1
vote
4 answers

Last 2/3 task takes huge time compared to all other task in spark

I am trying to do sentimental analysis of comments. Program is running successfully on Spark, But the problem I am facing is out of 70 partitions 68 partitions gave result in around 20% of time compared to the time taken by last 2 partitions. I have…
1
vote
1 answer

Why does Spark MLlib HashingTF output only 1D Vectors?

So I have this big dataframe with the format: dataframe: org.apache.spark.sql.DataFrame = [id: string, data: string] Data is a very big set of words/indentifiers. It also contains unnecessary symbols like ["{ etc. which I need to clean up. My…
Mnemosyne
  • 1,162
  • 4
  • 13
  • 45
1
vote
1 answer

Spark mllib Classification using scala

I am new to the Spark infrastructure so this question may be silly. I use the mllib for text classification. I have a set of sentences with labels which I feed to a MultinomialNaiveBayes classifier for training. I found an example for that. My input…
1
vote
1 answer

Scalable invocation of Spark MLlib 1.6 predictive model w/a single data record

I have a predictive model (Logistic Regression) built in Spark 1.6 that has been saved to disk for later reuse with new data records. I want to invoke it with multiple clients with each client passing in single data record. It seems that using…
1
vote
2 answers

How to get confidence scores from Spark MLLib Logistic Regression in java

UPDATE: I tried using the following way to generate confidence scores but its giving me an exception. I use the below code snippet: double point = BLAS.dot(logisticregressionmodel.weights(), datavector); double confScore = 1.0 / (1.0 +…
ArinCool
  • 1,720
  • 1
  • 13
  • 24
1
vote
2 answers

Text Classification using Spark ML

I have a free text description based on which I need to perform a classification. For example the description can be that of an incident. Based on the description of the incident , I need to predict the risk associated with the event . For eg : "A…
1
vote
1 answer

Spark import of the mllib package member

I want to import from Spark CholeskyDecomposition. I do it in the next way. First of all I have modified my sbt file and add additional dependency: "org.apache.spark"%%"spark-mllib"%"1.3.0" then I do import in my Scala code: import…
Guforu
  • 3,835
  • 8
  • 33
  • 52
1
vote
1 answer

Spark MLib ALS input and output domains

I'm using Spark MLib ALS and trying to use the trainImplicit() interface to feed it the number of an item purchased by a user as the implicit preference. I don't know how to validate my model though. My input is in the domain [1, inf), but the…
Aaron McMillin
  • 2,532
  • 27
  • 42
1
vote
0 answers

Spark: Can I save models generated using ml package in spark 1.5.1?

I would like to save model created by spark's ml package as spark models(.paraquet) or pmml. The model.save method is applicable only to spark 1.6 or later versions. Is there some way I can save my models using spark 1.5.1?
1
vote
1 answer

Incremental classification of SVM or any classifier in SPARK

Is there any way to perform incremental classification of any classifier in Spark / MLlib ? What I want is to retrain existing model with new data set. New data set can come at any time and i would like to add it in already trained classifier.
babz
  • 41
  • 2
1
vote
1 answer

Spark - MLlib computeSVD executing on driver

I am running RowMatrix.computeSVD using scala, in UI it appears that one stage only the "treeAggregate" is running on the cluster and after that the UI of the application master shows nothing while the application continues to execute the…
Francois Saab
  • 77
  • 1
  • 9