Questions tagged [apache-spark-mllib]

MLlib is a machine learning library for Apache Spark

MLlib is a low level, RDD based machine learning library for Apache Spark

External links:

Related tags:

,

2241 questions
1
vote
1 answer

Best range of parameters in grid search?

I would like to run a naive implementation of grid search with MLlib but I am a bit confused about choosing the 'best' range of parameters. Apparently, I do not want to waste too much resources for a combination of parameters that will probably not…
user706838
  • 5,132
  • 14
  • 54
  • 78
1
vote
0 answers

Iterate a logistic regression code over different training datas in pyspark

I want to iterate the following sample logistic regression code over different training data that are stored in different files using pyspark: from pyspark.mllib.classification import LogisticRegressionWithLBFGS,LogisticRegressionModel from…
1
vote
0 answers

Spark - Reload saved Featurization Pipeline vs instantiate new Pipeline with same stages

I would like to check if I'm missing any important points here. My pipeline is only for Featurization. I understand that once a pipeline that includes an Estimator is fitted; then saving the pipeline will persist the params the Estimator has…
brent
  • 1,095
  • 1
  • 11
  • 27
1
vote
1 answer

Spark ML- failing to load model using MatrixFactorizationModel

I am trying to implement recommender system using Spark collaborative filtering. First I prepare model and save to disk: MatrixFactorizationModel model = trainModel(inputDataRdd); model.save(jsc.sc(), "/op/tc/model/"); When I load model using…
Rahul Sharma
  • 5,614
  • 10
  • 57
  • 91
1
vote
1 answer

Reloaded Spark model does not seem to work

I am training and saving model from CSV file. Everything is okey for this first step. After saving model, i am trying to load and use saved model with new data but it does not work. What is the problem? Training Java File SparkConf sconf = new…
kkurt
  • 410
  • 3
  • 20
1
vote
0 answers

Spark MLlib error on trainimplicit

Need help! I am using Spark MLlib, ALS.trainimpilict. When I am doing grid search, in most cases the code works normally, but at certain parameters, it will stop and shows the error message. Like: ...... Rank 40, reg 1.0, alpha 2.0, the RMSE =…
1
vote
1 answer

How do I install the MLlib Apache Spark library into a JAVA Eclpise project?

I want to implement some machine learning algorithms using the Spark MLlib library for my Java project. I have tried several tutorial without success. I am used to using eclipse and was surprised that it was so difficult to set up. My assumption…
A.Dumas
  • 2,619
  • 3
  • 28
  • 51
1
vote
0 answers

Spark - Polynomial Expansion vecor size exceeded

I'm using spark to run LinearRegression. Since my data can not be predicted to a linear model, I added some higher polynomial features to get a better result. This works fine! Instead of modifying the data myself, I wanted to use the…
1
vote
2 answers

Spark MLlib K-Means Clustering

I have some geographical points defined with latitude, longitude and score and I want to use MLlib K-Means algorithm to make clusters. Is that available with MLlib K-Means and if available, how can I pass the parameters or features to the algorithm…
1
vote
3 answers

SparkR from Rstudio - gives Error in invokeJava(isStatic = TRUE, className, methodName, ...) :

I am using RStudio. After creating session if i try to create dataframe using R data it gives error. Sys.setenv(SPARK_HOME = "E:/spark-2.0.0-bin-hadoop2.7/spark-2.0.0-bin-hadoop2.7") Sys.setenv(HADOOP_HOME =…
Sudhakar Chavan
  • 377
  • 4
  • 14
1
vote
1 answer

predicting next event from averaging sequences

I am pretty new in ml so I am facing some difficulties realizing how could I use spark machine learning libraries with time series data that reflect to a sequence of events. I have a table that contains this info: StepN#, element_id,…
1
vote
1 answer

Apache Spark MLlib with DataFrame API gives java.net.URISyntaxException when createDataFrame() or read().csv(...)

In a standalone application (runs on java8, Windows 10 with spark-xxx_2.11:2.0.0 as jar dependencies) next code gives an error: /* this: */ Dataset logData = spark_session.createDataFrame(Arrays.asList( new LabeledPoint(1.0,…
1
vote
1 answer

Spark Streaming - Classification of tweets' stream from Kafka

I am new at Spark and I absolutely need some help for classifying tweets from a Kafka Stream. Following I will explain the step processes i have done until now as well as the point where I'm stuck. I hope some of you guys can help me out with…
1
vote
1 answer

Spark streaming multiple KMeans with mapWithState

Hi i'm planning a deployment where Spark could do the heavy lifting of processing incoming data from Kafka to apply the StreamingKMeans for outlier detection. However data incoming from the Kafka topic arrives from various sources, defining…
Peterdeka
  • 387
  • 5
  • 19
1
vote
0 answers

How do I normalize org.apache.spark.mllib.linalg.Vectors?

It is pretty easy to normalize vectors in Scala (scala.collection.immutable.Vector) using map: val w = Vector(3,4,5) /** L1 norm: **/ val w_normalized = w.map { _/w.sum } But you can't perform the same thing with…
fremorie
  • 713
  • 2
  • 9
  • 20