Highest Voted 'apache-spark-mllib' Questions

1

vote

1 answer

How to store the text file on the Master?

I am using Standalone clusters to run the ALS algorithm. The predictions are being stored to the textfile using: saveAsTextFile(path) But the text file is being stored on the clusters. I want to store the text file on the Master.

scala apache-spark apache-spark-mllib

asked Mar 24 '16 at 10:21

Shishir Anshuman

1,115
7
23

1

vote

1 answer

Gaussian Mixture Model in scala spark 1.5.1 weights are always uniformly distributed

I implemented the default gmm model provided in mllib for my algorithm. I am repeatedly finding that the resultant weights are always equally waited no matter how many clusters i initiate. Is there any specific reason why the weights are not being…

scala apache-spark apache-spark-mllib

asked Mar 23 '16 at 02:01

Leothorn

1,345
1
23
45

1

vote

1 answer

Issue when writing to file in spark

I'm working on spark in local mode with the following options spark-shell --driver-memory 21G --executor-memory 10G --num-executors 4 --driver-java-options "-Dspark.executor.memory=10G" --executor-cores 8 It is a four node cluster of 32G RAM…

scala apache-spark apache-spark-mllib

asked Mar 21 '16 at 05:42

tourist

4,165
6
25
47

1

vote

0 answers

MLLib Classification deployment over http

I want to deploy a classifer I trained using mllib over http service. So, I am wondering whether if I load the serialized object in my code and send it some data is it necessary to run a local version of spark as well. And if so is there any effect…

apache-spark apache-spark-mllib

asked Mar 17 '16 at 15:16

ilijaluve

1,050
2
10
24

1

vote

0 answers

Spark - MLlib Obtain loss (cost/error) history from LogisticRegressionWithLBFGS

I am using Apache-Spark to perform logistic regression w/ LBFGS. I am trying to generate Learning Curves to see whether my model is suffering from high bias or high variance. Andrew Ng discusses the usefulness of learning curves in his Lecture on…

scala apache-spark logistic-regression apache-spark-mllib

asked Mar 16 '16 at 01:27

Brian

7,098
15
56
73

1

vote

2 answers

Converting a [(Int, Seq[Double])] RDD to LabeledPoint

I have an RDD of the following format and would like to convert it into a LabeledPoint RDD in order to process it in mllib : Test: RDD[(Int, Seq[Double])] = Array((1,List(1.0,3.0,8.0),(2,List(3.0,…

scala apache-spark apache-spark-mllib

asked Mar 14 '16 at 11:42

ulrich

3,547
5
35
49

1

vote

0 answers

java.lang.OutOfMemoryError when saving a model on the disk using Spark-mllib

I am trying to run LDA on a very small dataset of ~1000 documents. The LDA work fine and I am also able to save the model. If I run the program without lDAModel.save(), I get the following at the end: 16/03/13 14:26:52 INFO SparkUI: Stopped Spark…

scala apache-spark out-of-memory apache-spark-mllib

asked Mar 13 '16 at 18:49

Animesh Pandey

5,900
13
64
130

1

vote

1 answer

How to update MLLIB version in PySpark

I have installed Cloudera VM and hence it has PySpark with MLLIB library, but the ML library MLLIB is too old, I just wanted to upgrade it with latest version of MLLIB, Already updated the python from 2.6 to 2.7, but unable to find any documentation…

python apache-spark pyspark apache-spark-mllib cloudera-quickstart-vm

asked Mar 09 '16 at 11:24

krishna Prasad

3,541
1
34
44

1

vote

1 answer

Multiclass classification with Gradient Boosting Trees in Spark: only supporting binary classification

While trying to run multi-class classification using Gradient Boosting Trees in Spark mllib. But it is giving an error "only binary classification is supported". The dependent variable has 8 levels. The data has 276 columns and 7000…

scala apache-spark gradient apache-spark-mllib multilabel-classification

asked Mar 07 '16 at 08:10

PARTHA TALUKDER

321
5
17

1

vote

0 answers

Spark submit - Input_raw python

i want test my model with a input values. So, my script is #!/usr/bin/env python # -*- coding: utf-8 -*- import csv import sys import os from pyspark.mllib.regression import LabeledPoint import numpy as np from pyspark.mllib.evaluation import…

python apache-spark pyspark apache-spark-mllib

asked Mar 04 '16 at 10:41

SirGustave

342
1
2
13

1

vote

1 answer

In the Spark UI, what does it mean when a task has a status of GET RESULT?

I have a Spark job which trains a model using Spark ML's logistic regression. In the Spark UI under the stage details page for a tree aggregation stage I see a few tasks with a status of "GET RESULT". What does this status mean? What causes a task…

apache-spark apache-spark-mllib apache-spark-ml

asked Mar 03 '16 at 21:01

Daniel Siegmann

287
1
3
5

1

vote

1 answer

Linking the resulting TFIDF sparse vectors to the original documents in Spark

I am calculating the TFIDF using Spark with Python using the following code: hashingTF = HashingTF() tf = hashingTF.transform(documents) idf = IDF().fit(tf) tfidf = idf.transform(tf) for k in tfidf.collect(): print(k) I…

python apache-spark pyspark tf-idf apache-spark-mllib

asked Feb 29 '16 at 10:34

K.Ali

283
4
15

1

vote

1 answer

Summation of TFIDF sparse vector values for each document in Spark with Python

I calculated the TFIDF for 3 sample text documents using HashingTF and IDF of Pyspark and I got the following SparseVector result: (1048576,[558379],[1.43841036226]) (1048576,[181911,558379,959994], …

python apache-spark tf-idf apache-spark-mllib

asked Feb 26 '16 at 16:09

K.Ali

283
4
15

1

vote

1 answer

Apache Spark MLlib LabeledPoint null label issue

I'm trying to run one of MLlib algorithms, namely LogisticRegressionWithLBFGS on my database. This algorithm takes the training set as LabeledPoint. Since LabeledPoint requires a double label ( LabeledPoint( double label, Vector features) ) and my…

scala apache-spark apache-spark-sql logistic-regression apache-spark-mllib

asked Feb 26 '16 at 12:41

Merve Bozo

439
1
6
12

1

vote

1 answer

Create a JavaRDD without file in spark

I am totally new to spark and I want to create a JavaRDD from labeled points programmatically without reading input from file. Say I create few Labeledpoints as following, LabeledPoint pos = new LabeledPoint(1.0, Vectors.dense(1.0, 0.0, 3.0)); …

apache-spark apache-spark-mllib apache-spark-ml

asked Feb 21 '16 at 04:35

user1097675

33
1
2
6

Questions tagged [apache-spark-mllib]