matrix-vector-multiplication with hadoop: vector and matrix in different files

Question

I want to do matrix-vector-multiplication with hadoop. i've got a small working example now: there is only one input file containing the rows of the matrix always followed by the vector it is multiplied with. So each map-task gets one row and the vector from this single file.

Now I would like to have two input files. One file should contain the matrix and another one the Vector. but I cant think of a hadoop way to let the mapper access both files.

What would be the best approach here?

Thanks for your help!

You can read the vector in the setup method of mapper, and use it to do the multiplication. — zsxwing, Jun 23 '13 at 11:19
There are several ways to do that For more detail you can see http://stackoverflow.com/questions/11059725/is-it-possible-to-have-multiple-inputs-with-multiple-different-mappers-in-hadoop — twid, Jun 23 '13 at 14:34

score 0 · Accepted Answer · answered Jun 23 '13 at 12:05

0

The easiest and most efficient solution is to read the vector into memory in the Mapper directly from HDFS (not as map() input). Presumably it is not so huge that it can't fit in memory. Then, map() only the matrix data by row. As you receive each row, dot it with the vector to produce one element of the output. Emit (index,value) and then construct the vector in the Reducer (if needed).

answered Jun 23 '13 at 12:05

Sean Owen

66,182
23
141
173

How will we able to get the resultant vector with correct positioning?If matrix is of large size around 2 HDFS block , we will get same index in these 2 mappers. How will we able to construct final vector? – USB Jan 23 '16 at 09:54

matrix-vector-multiplication with hadoop: vector and matrix in different files

1 Answers1