Highest Voted 'apache-spark-mllib' Questions

1

vote

1 answer

K-Means on time series data with Apache Spark

I have a data pipeline system where all events are stored in Apache Kafka. There is an event processing layer, which consumes and transforms that data (time series) and then stores the resulting data set into Apache Cassandra. Now I want to use…

algorithm apache-spark k-means apache-spark-mllib anomaly-detection

asked Jul 13 '16 at 16:02

codejitsu

3,162
2
24
38

1

vote

0 answers

What is the maximum number of column being supported by apache spark dataframe

SPARK-Version: 1.5.2 with yarn 2.7.1.2.3.0.0-2557 I'm running into a problem while I'm exploring the data through spark-shell that I'm trying to create a really fat dataframe that with 3000 columns. Code as below: val valueFunctionUDF = udf((valMap:…

apache-spark apache-spark-sql apache-spark-mllib

asked Jul 11 '16 at 16:29

EdwinGuo

1,765
2
21
27

1

vote

1 answer

Non-integer ids in Spark MLlib ALS

I'd like to use val ratings = data.map(_.split(',') match { case Array(user,item,rate) => Rating(user.toInt,item.toInt,rate.toFloat) }) val model = ALS.train(ratings,rank,numIterations,alpha) However, the user data i get…

scala apache-spark apache-spark-mllib

asked Jul 06 '16 at 14:04

ZMath_lin

523
2
6
14

1

vote

2 answers

how to print Map[String, Array[Float]] in scala?

I am using word2vec function which is inside mllib library of Spark. I want to print word vectors which I am getting as output to "getVectors" function My code looks like this: import org.apache.spark._ import org.apache.spark.rdd._ import…

scala dictionary apache-spark apache-spark-mllib word2vec

asked Jul 06 '16 at 11:58

Aditi

820
11
27

1

vote

0 answers

has training error using pyspark ALS

I run Spark on a virtual machine and implemented ALS library to train my data. rawRatings = sc.textFile('data/ratings.csv').map(lambda x: x.replace('\t', ',')) parsedRatings = rawRatings.map(lambda x: x.split(',')).map(lambda x: Rating(int(x[0]),…

apache-spark pyspark apache-spark-mllib collaborative-filtering

asked Jul 03 '16 at 04:58

TripleH

447
7
16

1

vote

0 answers

Spark Correlation Coefficient

I have an specific application in which I am trying to verify the strong positive relationship between many of the time series data that I am reading. I Should elaborate more: I have a lot of actors which are distributed, and they generate…

apache-spark statistics apache-spark-mllib statistics-bootstrap apache-spark-ml

asked Jun 25 '16 at 10:53

M.Rez

1,802
2
21
30

1

vote

0 answers

How to deal with categoricalFeaturesInfo?

How do I deal with categoricalFeaturesInfo in RandomForest? I created a list of variables like this: alllist = listdouble + listint + listcategorielfeatures But when I create LabeledPoint I lose this order. How can I keep type of my variable like…

apache-spark pyspark random-forest apache-spark-mllib

asked Jun 24 '16 at 08:02

malouke

529
2
5
6

1

vote

2 answers

Why does ALS.trainImplicit give better predictions for explicit ratings?

Edit: I tried a standalone Spark application (instead of PredictionIO) and my observations are the same. So this is not a PredictionIO issue, but still confusing. I am using PredictionIO 0.9.6 and the Recommendation template for collaborative…

machine-learning apache-spark-mllib recommendation-engine collaborative-filtering

asked Jun 24 '16 at 07:18

stholzm

3,395
19
31

1

vote

1 answer

how to keep records information when working in Mllib

I'm working on a classification problem in which I have to use mllib library. The classification algorithms (let's say Logistic Regression) in mllib require an RDD[LabeledPoint]. A LabeledPoint has only two fields, a label and a feature vector. When…

apache-spark apache-spark-mllib

asked Jun 23 '16 at 20:33

HHH

6,085
20
92
164

1

vote

0 answers

Best Practice of mapping String to a unique Integer in distributed mode

I have a dataset with 40K entries, each entry look like the following : product/productId: B00004CK40 review/userId: A39IIHQF18YGZA review/profileName: C. A. M. Salas review/helpfulness: 0/0 review/score: 4.0 review/time: 1175817600…

java apache-spark machine-learning mapreduce apache-spark-mllib

asked Jun 23 '16 at 18:39

Jay

717
11
37

1

vote

3 answers

what is setNumClasses in LogisticRegressionWithLBFGS Spark-Mllib

I couldn't understand what is the significance of setNumClasses here also couldn't find anything in the sparkmllib documentation. new LogisticRegressionWithLBFGS() .setNumClasses(10)

apache-spark logistic-regression apache-spark-mllib

asked Jun 23 '16 at 07:05

Naresh

5,073
12
67
124

1

vote

1 answer

Inverse of a spark RowMatrix

I am trying to inverse a spark rowmatrix. The function I am using is below. def computeInverse(matrix: RowMatrix): BlockMatrix = { val numCoefficients = matrix.numCols.toInt val svd = matrix.computeSVD(numCoefficients, computeU = true) val…

scala apache-spark apache-spark-mllib

asked Jun 22 '16 at 14:56

Debasish

113
1
9

1

vote

1 answer

text type independent variable to numeric type spark naive bayes

I have doubt with Naive bayes with numeric and non numeric features . like I have 5 independent independent parameter on these i want to classify data . Male,Suspicion of Alcohol,Weekday,12am-4am,75,30-39 Male,Moving Traffic…

apache-spark apache-spark-mllib naivebayes

asked Jun 17 '16 at 06:19

mahendra singh

384
1
13

1

vote

1 answer

Understanding Spark MLlib LDA input format

I am trying to implement LDA using Spark MLlib. But I am having difficulty understanding input format. I was able to run its sample implementation to take input from a file which contains only number's as shown : 1 2 6 0 2 3 1 1 0 0 3 1 3 0 1 3 0 0…

apache-spark-mllib lda

asked Jun 16 '16 at 20:53

Amit Kumar

2,685
2
37
72

1

vote

0 answers

Why we can not define our own folds when we are using CrossValidator?

I have been using cross validation process in order to train a Naive Bayes Model and I realize that it uses kFold method to get the random sampling data in order to create the folds. This method return an Array[(RDD[T], RDD[T])] of tuples, which I…

apache-spark apache-spark-mllib cross-validation apache-spark-1.5

asked Jun 16 '16 at 20:42

dbustosp

4,208
25
46

Questions tagged [apache-spark-mllib]