0

I want to do matrix-vector-multiplication with hadoop. i've got a small working example now: there is only one input file containing the rows of the matrix always followed by the vector it is multiplied with. So each map-task gets one row and the vector from this single file.

Now I would like to have two input files. One file should contain the matrix and another one the Vector. but I cant think of a hadoop way to let the mapper access both files.

What would be the best approach here?

Thanks for your help!

Filip Haglund
  • 13,919
  • 13
  • 64
  • 113
Damian
  • 139
  • 3
  • 13
  • 1
    You can read the vector in the setup method of mapper, and use it to do the multiplication. – zsxwing Jun 23 '13 at 11:19
  • There are several ways to do that For more detail you can see http://stackoverflow.com/questions/11059725/is-it-possible-to-have-multiple-inputs-with-multiple-different-mappers-in-hadoop – twid Jun 23 '13 at 14:34

1 Answers1

0

The easiest and most efficient solution is to read the vector into memory in the Mapper directly from HDFS (not as map() input). Presumably it is not so huge that it can't fit in memory. Then, map() only the matrix data by row. As you receive each row, dot it with the vector to produce one element of the output. Emit (index,value) and then construct the vector in the Reducer (if needed).

Sean Owen
  • 66,182
  • 23
  • 141
  • 173
  • How will we able to get the resultant vector with correct positioning?If matrix is of large size around 2 HDFS block , we will get same index in these 2 mappers. How will we able to construct final vector? – USB Jan 23 '16 at 09:54