23

How to create SparseVector and dense Vector representations

if the DenseVector is:

denseV = np.array([0., 3., 0., 4.])

What will be the Sparse Vector representation ?

zero323
  • 322,348
  • 103
  • 959
  • 935
Anoop Toffy
  • 918
  • 1
  • 9
  • 22
  • 8
    For those who read the title of "Sparse Vector vs Dense Vector" and were looking for an explanation of when to use which, [this answer](http://stackoverflow.com/a/26706528/877069) has the information you're looking for. – Nick Chammas Jul 28 '16 at 14:40

3 Answers3

26

Unless I have thoroughly misunderstood your doubt, the MLlib data type documentation illustrates this quite clearly:

import org.apache.spark.mllib.linalg.Vector;
import org.apache.spark.mllib.linalg.Vectors;

// Create a dense vector (1.0, 0.0, 3.0).
Vector dv = Vectors.dense(1.0, 0.0, 3.0);
// Create a sparse vector (1.0, 0.0, 3.0) by specifying its indices and values corresponding to nonzero entries.
Vector sv = Vectors.sparse(3, new int[] {0, 2}, new double[] {1.0, 3.0});

Where the second argument of Vectors.sparse is an array of the indices, and the third argument is the array of the actual values in those indices.

Alberto Bonsanto
  • 17,556
  • 10
  • 64
  • 93
Chthonic Project
  • 8,216
  • 1
  • 43
  • 92
26

Sparse vectors are when you have a lot of values in the vector as zero. While a dense vector is when most of the values in the vector are non zero.

If you have to create a sparse vector from the dense vector you specified, use the following syntax:

import org.apache.spark.mllib.linalg.Vector;
import org.apache.spark.mllib.linalg.Vectors;

Vector sparseVector = Vectors.sparse(4, new int[] {1, 3}, new double[] {3.0, 4.0});
Debvrat Varshney
  • 450
  • 1
  • 7
  • 15
Saurabh
  • 700
  • 8
  • 6
1

Dense: Use it when you are having high probability of data. sparse: Use it when you are having less available data positions filled (i.e. you are having too many zeroes) eg: {0.0,3.0,0.0,4.0} for different Vectors it will be

val posVector = Vector.dense(0.0, 3.0, 0.0, 4.0)  // all data will be in dense

val sparseVector = Vector.sparse(4, Array(1, 3), Array(3.0, 4.0)) //only non-zeros are mentioned

Syntax ex: Vector.sparse(size of vector, non-zero-index, values)

Karan Gehlod
  • 134
  • 1
  • 9