0

I want to know the basic difference between RowMatrix and Matrix class available in Apache Spark.

zero323
  • 322,348
  • 103
  • 959
  • 935
Dhaksha
  • 53
  • 7
  • 2
    I am not aware of either class in Java. Do you have full qualified class name (with package)? – Thilo Feb 19 '16 at 05:12
  • org.apache.spark.mllib.linalg.distributed.RowMatrix org.apache.spark.mllib.linalg.Matrix @Thilo – Dhaksha Feb 19 '16 at 05:15

2 Answers2

4

A little bit more precise question here would be what is a difference between mllib.linalg.Matrix and mllib.linalg.distributed.DistributedMatrix.

  • Matrix is a trait which represents local matrices which reside in a memory of a single machine. For now there are two basic implementations: DenseMatrix and SparseMatrix.
  • DistributedMatrix is a trait which represents distributed matrices build on top of RDD. RowMatrix is a subclass of a DistributedMatrix which stores data in a row-wise manner without meaningful row ordering. There are other implementations of DistributedMatrix (like IndexedRowMatrix, CoordinateMatrix and BlockMatrix) each with its own storage strategy and specific set of methods. See for example Matrix Multiplication in Apache Spark
Community
  • 1
  • 1
zero323
  • 322,348
  • 103
  • 959
  • 935
-1

This is going to come down a little to the idioms of the language / framework / discipline you're using, but in computer science, an array is a one dimensional "list" of "things" that can be referenced by their position in the list. One of the things that can be in the list is another array which let you make arrays of arrays (of arrays of arrays ...) giving you a data set arbitrarily large dimension.

A matrix comes from linear algebra and is a two dimensional representation of data (which can be represented by an array of arrays) that comes with a powerful set of mathematical operations that allows you to manipulate the data in interesting ways. While arrays can vary in size, the width and height of a matrix is generally know based on the specific type of operations you're going to perform.

Matrixes are used extensively in 3d graphics and physics engines because they are a fast, convenient way of representing transformation and acceleration data for objects in three dimensions.

Array : Collection of homogeneous elements.

Matrix : A simple row and column thing.

Both are different things in different spaces. But in computer programming, a collection of single dimensions array can be termed as matrix. You can represent an 2d Array(i.e, collection of single dimension arrays) in matrix form.

Example

A[2][3] : This means A is a collection of 2 single dimension arrays each of size 3.

A[1,1] A[1,2] A[1,3] //This is a single dimensional array

A[2,1] A[2,2] A[2,3] //This is another single dimensional array

//The collection is a multi-dimensional or 2d Array.

Community
  • 1
  • 1
Mohith P
  • 585
  • 5
  • 14
  • But i need the difference between RowMatrix and Matrix ,not for Array and Matrix – Dhaksha Feb 19 '16 at 05:31
  • row matrix is a 1 × m matrix, that is, a matrix consisting of a single row of m elements . – Mohith P Feb 19 '16 at 05:39
  • if i print the number of rows using RowMatrix inputDataSet = new RowMatrix(parsedData.rdd()); System.out.println(inputDataSet.numRows()); //7 pls explain the output – Dhaksha Feb 19 '16 at 05:45
  • Sorry i can't understand your question and what is "RowMatrix" ,is it a class in your program? – Mohith P Feb 19 '16 at 05:55
  • ok....then how can i explain you things without understand the structure of the class? – Mohith P Feb 19 '16 at 08:58
  • This answer is irrelevant to the question, as it describes matrices and arrays in general and has no relation to Apache Spark and its classes. – fault-tolerant Jun 23 '19 at 19:09