Map-Reduce to solve Matrix multiplication in python with Hadoop

Question

I would like to apply map-reduce to deal with matrix multiplication in python with Hadoop. The goal is to calculate A * B. The output should be similar with the input.

Input are two matrix A and B formate looks like this:

A,0,0,0.0
A,0,1,1.0
...
A,1,3,8.0
A,1,4,9.0
B,0,0,0.0
B,0,1,1.0
...
B,4,0,12.0
B,4,1,13.0

A,0,0,0.0 means the index is A(0,0) and the value is 0.0, same for B.

This is my map function:

import sys
import string
import numpy
#Split line into array of entry data
entry = line.split(",")
# Set row, column, and value for this entry
row = int(entry[1])
col = int(entry[2])
value = float(entry[3])

#If this is an entry in matrix A...
if (entry[0] == "A"):

        #Generate the necessary key-value pairs
        for i in range(col):
                print('<{}{},{} {} {}}>'.format(row,i,A,col,value))
#Otherwise, if this is an entry in matrix B...
else:
        #Generate the necessary key-value pairs
        for i in range(row):
                print('<{}{},{} {} {}}>'.format(i,col,B,row,value))

I could like to know how to write the reduce function. Here is the frame that I will work with:

import sys
import string
import numpy

#number of columns of A/rows of B
n = int(sys.argv[1])

#Create data structures to hold the current row/column values (if needed; your code goes here)



currentkey = None

# input comes from STDIN (stream data that goes to the program)
for line in sys.stdin:

        #Remove leading and trailing whitespace
        line = line.strip()

        #Get key/value
        key, value = line.split('\t',1)

        #Parse key/value input (your code goes here)

    #If we are still on the same key...
    if key==currentkey:

            #Process key/value pair (your code goes here)


    #Otherwise, if this is a new key...
    else:
            #If this is a new key and not the first key we've seen
            if currentkey:

                    #compute/output result to STDOUT (your code goes here)

            currentkey = key

            #Process input for new key (your code goes here)

#Compute/output result for the last key (your code goes here)

To run these two functions, i will test them with a small test dataset with the following code:

cat smalltest.txt | python src/map.py 2 3 | sort -n | python src/reduce.py 5

Map gives the output of , then use sort -n to sort the keys, so I will use the reducer to deal with the matrix calculation. My confusion is in writing the reducer function.

One confusion of mine is that i was told just need to print the output of map function, then the reduce will extract the information by it self. However, it is not make sense to me and I dont know how to write the code. — HHKSHD_HH, Feb 06 '18 at 18:27
https://lendap.wordpress.com/2015/02/16/matrix-multiplication-with-mapreduce/ I found this link is describing the question well but i am still a little confusing with python coding. — HHKSHD_HH, Feb 06 '18 at 19:58

score 0 · Answer 1 · answered Feb 06 '18 at 20:02

not sure why reduce
my numpy approach (with some string/list/zip gymnastics)

 strin = '''A,0,0,0.0
A,0,1,1.0
A,1,0,8.0
A,1,1,9.0
B,0,0,0.0
B,0,1,1.0
B,1,0,12.0
B,1,1,13.0'''.split()

lines = [*map(lambda x: x.split(","),strin)]

linesT = [*zip(*lines)]

linesT

[('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
 ('0', '0', '1', '1', '0', '0', '1', '1'),
 ('0', '1', '0', '1', '0', '1', '0', '1'),
 ('0.0', '1.0', '8.0', '9.0', '0.0', '1.0', '12.0', '13.0')]

and now we can get dims, data for arrays A, B

lastA = linesT[0].index("B") - 1

rowsA, colsA = int(linesT[1][lastA]) + 1, int(linesT[2][lastA]) + 1

datA = [*map(float, linesT[3][0:lastA + 1])]

A = np.array(datA).reshape((rowsA, colsA))

A
Out[50]: 
array([[ 0.,  1.],
       [ 8.,  9.]])

firstB = lastA + 1

rowsB, colsB = int(linesT[1][-1]) + 1, int(linesT[2][-1]) + 1

datB = [*map(float, linesT[3][firstB::])]

B = np.array(datB).reshape((rowsB, colsB))

B
Out[51]: 
array([[  0.,   1.],
       [ 12.,  13.]])

A @ B
Out[52]: 
array([[  12.,   13.],
       [ 108.,  125.]])

You answer is right and I understand your logic, but my confusion is how to use reducer to deal with the key values returned from mapper. — HHKSHD_HH, Feb 06 '18 at 20:35

score 0 · Answer 2 · answered Feb 06 '18 at 20:08

Alright, ill be straight to the point,

    lines = [*map(lambda x: x.split(","),strin)]

Is way to simplified, if the lambda function itself isn't even in an input with a syntax, it'd be as if the string was non-existent Reducing it is honestly something you should be thankful for, this code (not to be harsh) is messy, so i don't see why your complaining about the auto-reduce..

Map-Reduce to solve Matrix multiplication in python with Hadoop

2 Answers2