Highest Voted 'apache-spark-mllib' Questions

1

vote

0 answers

Debug ArrayOutOfBoundsException in PySpark mllib

I'm trying to get started with mllib in PySpark, and after having built a dataset I'm trying to run a basic logistic regression. > train.take(4) [LabeledPoint(0.0, (4,[485,909,1715,2023],[1.0,1.0,1.0,1.0])), LabeledPoint(0.0,…

python apache-spark pyspark apache-spark-mllib

asked Oct 27 '15 at 13:57

Patrick McCarthy

2,478
2
24
40

1

vote

1 answer

Efficient way of row/column sum of a IndexedRowmatrix in Apache Spark

I have a matrix in a CoordinateMatrix format in Scala. The Matrix is sparse and the entires look like (upon coo_matrix.entries.collect), Array[org.apache.spark.mllib.linalg.distributed.MatrixEntry] = Array( MatrixEntry(0,0,-1.0),…

scala matrix apache-spark apache-spark-mllib rowsum

asked Oct 23 '15 at 15:02

Kent Carlevi

133
1
11

1

vote

2 answers

Convert String to Double in Scala / Spark?

I have JSON data set that contains a price in a string like "USD 5.00". I'd like to convert the numeric portion to a Double to use in an MLLIB LabeledPoint, and have managed to split the price string into an array of string. The below creates a data…

scala apache-spark apache-spark-mllib

asked Oct 23 '15 at 01:16

schnee

1,050
2
9
20

1

vote

1 answer

CountVectorizerModel error with apache spark - Java API

I am working with the sample code follow document of Apache Spark: https://spark.apache.org/docs/latest/ml-features.html#countvectorizer import java.util.Arrays; import org.apache.spark.SparkConf; import…

java apache-spark apache-spark-mllib

asked Oct 22 '15 at 16:05

Thanh Thai Nguyen

255
4
15

1

vote

0 answers

linear regression with spark: wrong prediction

I am trying to run the linear regression with spark but it gives me really wrong predictions: The data source: The program: def linear_regression(data): """ Run the linear regression algorithm on the data to perform the prediction """ …

apache-spark linear-regression pyspark prediction apache-spark-mllib

asked Oct 20 '15 at 13:32

rom

3,592
7
41
71

1

vote

3 answers

how to remove the error : NumberFormatException.java:65 ? when we implement the code of classification in apache-spark

import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf import org.apache.spark.rdd.RDD import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.mllib.linalg.Vectors import…

scala apache-spark classification rdd apache-spark-mllib

asked Oct 12 '15 at 07:51

Sangeen Khan

175
1
20

1

vote

1 answer

Features with High Cardinality ( How to Vectorize them?)

I am trying to run a machine learning problem using scikit learn on a dataset and one of the columns(feature) has high cardinality around 300K unique values.How do I vectorize such a feature. Using DictVectorizer would not be a solution as the…

python machine-learning scikit-learn apache-spark-mllib

asked Oct 09 '15 at 16:34

Gayatri

2,197
4
23
35

1

vote

1 answer

TF - IDF rdds into readable format using spark

I am trying to calculate TF-IDF for documents of strings and I am referring http://spark.apache.org/docs/latest/mllib-feature-extraction.html#tf-idf link. import org.apache.spark.rdd.RDD import org.apache.spark.SparkContext import…

scala apache-spark apache-spark-mllib tf-idf

asked Oct 08 '15 at 07:43

Mrunmayee

495
3
9
16

1

vote

1 answer

Can't use Vector from Spark ML Lib for the DataFrame

When I'm trying to use UDF that returns the Vector object, Spark throws the following exception: Cause: java.lang.UnsupportedOperationException: Not supported DataType: org.apache.spark.mllib.linalg.VectorUDT@f71b0bce How can I use Vector in my…

apache-spark apache-spark-sql apache-spark-mllib

asked Oct 07 '15 at 15:16

Zyoma

1,528
10
17

1

vote

2 answers

Spark's OnlineLDAOptimizer causing IndexOutOfBoundsException in Java

I'm using Latent Dirichlet Allocation in the Java version of Spark. The following line works fine: LDAModel ldaModel = new LDA()// .setK( NUM_TOPICS )// .setMaxIterations( MAX_ITERATIONS )// …

java apache-spark apache-spark-mllib

asked Oct 05 '15 at 08:31

Ben Allison

7,244
1
15
24

1

vote

1 answer

How to use RowMatrix.columnSimilarities (similarity search)

TL;DR; I am trying to train off of an existing data set (Seq[Words] with corresponding categories), and use that trained dataset to filter another dataset using category similarity. I am trying to train a corpus of data and then use it for text…

scala machine-learning apache-spark apache-spark-mllib

asked Oct 02 '15 at 16:35

Justin Pihony

66,056
18
147
180

1

vote

2 answers

Handling Missing values in SVM in apache spark ML Lib

I have a classification task. I want to use apache spark ml lib SVM algorithm for classification. I have input data which is n-dimensional. In the feature vectors some of dimensions may be missing. How to approach with missing values? I think it…

machine-learning svm apache-spark-mllib

asked Oct 02 '15 at 08:48

hard coder

5,449
6
36
61

1

vote

1 answer

how to predict the values in mllib

Hi i am new to spark mllib.I already have one r model.I am trying the same model with spark mllib.here is R model code. R code. delhi <- read.delim("UItrain.txt", na.strings = "") delhi$lnprice <- log(delhi$price) heddel <- lm(lnprice ~ bedrooms+…

r hadoop prediction apache-spark-mllib

asked Sep 30 '15 at 11:52

arun abimaniyu

167
2
12

1

vote

1 answer

Spark MLlib example, NoSuchMethodError: org.apache.spark.sql.SQLContext.createDataFrame()

I'm following the documentation example Example: Estimator, Transformer, and Param And I got error msg 15/09/23 11:46:51 INFO BlockManagerMaster: Registered BlockManager Exception in thread "main" java.lang.NoSuchMethodError: …

scala apache-spark sbt apache-spark-ml apache-spark-mllib

asked Sep 23 '15 at 18:57

keypoint

2,268
4
31
59

1

vote

1 answer

Spark MLlib LDA: the possible reasons behind generating always very similar LDA topics?

I am applying the MLlib LDA example on various corpora downloaded from enter link description here I am filtering out the stopwords, and also excluding the very frequent terms and the very rare terms. The problem is that I am always having topics…

apache-spark machine-learning nlp apache-spark-mllib lda

asked Sep 23 '15 at 15:02

Rami

8,044
18
66
108

Questions tagged [apache-spark-mllib]