Highest Voted 'apache-spark-mllib' Questions

1

vote

1 answer

Spark MLlib collaborative filtering---how to view movie factors?

I am working through this tutorial https://databricks-training.s3.amazonaws.com/movie-recommendation-with-mllib.html . How would one view the factors associated with each movie? In other words, how do I look at the model that has been trained?

apache-spark apache-spark-mllib

asked May 05 '15 at 02:08

user2767143

9
2

1

vote

1 answer

Spark 1.3.1 install failed in MLlib when I run make-distribution.sh in Ubuntu 14.04

Spark 1.3.1 install failed in MLlib when I run make-distribution.sh in Ubuntu 14.04 Java -version: java version "1.7.0_80" Java(TM) SE Runtime Environment (build 1.7.0_80-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode) Scala…

java scala apache-spark apache-spark-mllib

asked May 04 '15 at 01:36

zt1983811

1,011
3
14
34

1

vote

1 answer

spark 1.2.0 mllib kmeans: Out Of Memory Error

I'am new to spark, and I use KMeans algorithm to cluster a data set, which size is 484M, 213104 dimensions, and my code as follow: val k = args(0).toInt val maxIter = args(1).toInt val model = new…

apache-spark apache-spark-mllib

asked Apr 29 '15 at 03:55

ifloating

23
3

1

vote

1 answer

Spark - Naive Bayes classifier value error

I have the following issue when training a Naive Bayes classifier. I'm getting this error: File "/home/juande/Desktop/spark-1.3.0-bin-hadoop2.4/python/pyspark/mllib /classification.py", line 372, in train return NaiveBayesModel(labels.toArray(),…

apache-spark apache-spark-mllib

asked Apr 22 '15 at 14:20

user3276768

1,416
3
18
28

1

vote

1 answer

Spark - MLlib linear regression intercept and weight NaN

I have trying to build a regression model on Spark using some custom data and the intercept and weights are always nan. This is my data: data = [LabeledPoint(0.0, [27022.0]), LabeledPoint(1.0, [27077.0]), LabeledPoint(2.0, [27327.0]),…

apache-spark apache-spark-mllib

asked Apr 20 '15 at 14:46

user3276768

1,416
3
18
28

1

vote

2 answers

Spark: Read CSV file with headers

I have a CSV file with 90 columns and around 28000 rows. I want to load it and split it in train (75%) and test (25%). I used the following code: Code: val data = sc.textFile(datadir + "/dados_frontwave_corte_pedra_ferramenta.csv") .map(line…

csv apache-spark apache-spark-mllib

asked Apr 15 '15 at 16:04

Mohammad

1,006
2
15
29

1

vote

2 answers

Spark Categorical Data Encoding

Is there a function in Spark to do Categorical data encoding. Ex: Var1,Var2,Var3 1,2,a 2,3,b 3,2,c To var1,var2,var3 1,2,0 2,3,1 3,2,2 a -> 0, b->1, c->2

apache-spark apache-spark-mllib

asked Apr 07 '15 at 19:45

Joel

1,650
2
22
34

1

vote

2 answers

Categorical Variables in Apache Spark using MLib

I am relatively new to the world of Apache Spark. I am trying to estimate a large scale model using LinearRegressionWithSGD() where I would like to estimate fixed effects and interaction terms without having to create a huge design matrix. I noticed…

scala apache-spark apache-spark-mllib

asked Apr 01 '15 at 00:05

pierothebear

11
3

1

vote

1 answer

Spark - Prediction.io - scala.MatchError: null

I'm working on a template for prediction.io and I'm running into trouble with Spark. I keep getting a scala.MatchError error: full gist here scala.MatchError: null at org.apache.spark.rdd.PairRDDFunctions.lookup(PairRDDFunctions.scala:831) at…

scala apache-spark rdd apache-spark-mllib predictionio

asked Mar 31 '15 at 16:37

poorman

120
7

1

vote

1 answer

spark mllib memory error on svd (single machine)

I have a large data file (around 4 GB) and I am analyzing it using spark on a single pc. scala> x res29: org.apache.spark.mllib.linalg.distributed.RowMatrix = org.apache.spark.mllib.linalg.distributed.RowMatrix@5a86096a scala> x.numRows res27:…

scala apache-spark apache-spark-mllib

asked Mar 30 '15 at 14:05

Donbeo

17,067
37
114
188

1

vote

0 answers

Spark MLlib logs deprecated properties

I followed the training from databricks. It runs on Azure and has been build with this configuration: build.sbt import AssemblyKeys._ assemblySettings name := "movielens-als" version := "0.1" scalaVersion := "2.11.4" libraryDependencies +=…

azure apache-spark apache-spark-mllib

asked Mar 21 '15 at 16:26

erwineberhard

309
4
17

1

vote

1 answer

Run time error in scala : NoSuchMethodError

I am trying to use Spark MLlib algorithm's in Scala language in eclipse. There are no problems during compilation and while running there is an error saying "NoSuchMethodError". Here is my code #Copied import org.apache.spark.SparkConf import…

eclipse scala maven apache-spark apache-spark-mllib

asked Mar 19 '15 at 10:21

Jack Daniel

2,527
3
31
52

1

vote

0 answers

Use of similarity function and RowMatrix in apache spark

I need to compute similarity between average vector computed from RowMatrix and all vectors inside same RowMatrix. To compute average vector I am doing this (example in Java): RowMatrix matrix = new RowMatrix(vectorOfUserToItems.rdd()); Vector…

java apache-spark similarity apache-spark-mllib

asked Mar 19 '15 at 08:45

Adrian

71
1
12

1

vote

1 answer

Use of foreachActive for spark Vector in Java

How to write simple code in Java which iterate over active elements in sparse vector? Lets say we have such Vector: Vector sv = Vectors.sparse(3, new int[] {0, 2}, new double[] {1.0, 3.0}); I was trying with lambda or Function2 (from three…

java apache-spark apache-spark-mllib

asked Mar 09 '15 at 22:46

Adrian

71
1
12

1

vote

0 answers

Requested array size exceeds VM limit in MLLib Random Forest

I'm using MLLib to train a random forest. It's working fine to depth 15, but if I use depth 20 I get java.lang.OutOfMemoryError: Requested array size exceeds VM limit on the driver, from the collectAsMap operation in DecisionTree.scala, around…

java scala apache-spark out-of-memory apache-spark-mllib

asked Feb 23 '15 at 18:49

Luke Hewitt

91
5

Questions tagged [apache-spark-mllib]