13

I have a matlab code that processes images. I want to create a Hadoop mapper that uses that code. I came across the following solutions but not sure which one is best (as it is very difficult to install matlab compiler runtime on each slave node in hadoop for me):

  1. Manually convert that matlab code into OpenCV in C++ and call its exe/dll (and supply it appropriate parameters) from the mapper. Not sure, since the cluster has Linux installed on every node instead of Windows.

  2. Use Hadoop Streaming. But Hadoop streaming requires an executable as the mapper and the executable of matlab also requires Matlab Compiler Runtime which is very difficult to install on every slave node.

  3. Convert it automatically into C/C++ code and create its exe automatically (not sure whether this is right because either the exe will require the matlab runtime to run or there can be compiler issues in the conversion which are very difficult to fix )

  4. Use Matlab Java Builder. But the jar file thus created will need the runtime too.

Any suggestions?

Thanks in advance.

Harsh
  • 265
  • 6
  • 18

4 Answers4

9

As you are probably already suspecting, this is going to be inherently difficult to do because of the runtime requirement for MATLAB. I had a similar experience (having to distribute the runtime libraries) when attempting to run MATLAB code over Condor.

As far as the options you are listing are concerned, option #1 will work best. Also, you will probably not be available to avoid working with Linux.

However, if you don't want to lose the convenience provided by higher level software (such as MATLAB, Octave, Scilab and others) you could try Hadoop streaming in combination with Octave executable scripts.

Hadoop streaming does not care about the nature of the executable (whether it is an executable script or an executable file, according to this (http://hadoop.apache.org/common/docs/r0.15.2/streaming.html)).

All it requires, is that it is given an "executable" that in addition can a) read from stdin, b) send output to stdout.

GNU Octave programs can be turned into executable scripts (in Linux) with the ability to read from stdin and send the output to stdout (http://www.gnu.org/software/octave/doc/interpreter/Executable-Octave-Programs.html).

As a simple example consider this:

Create a file (for example "al.oct") with the following contents:

#!/bin/octave -qf  (Please note, in my installation i had to use "#!/etc/alternatives/octave -qf")
Q = fread(stdin); #Standard Octave / MATLAB code from here on
disp(Q);

Now from the command prompt issue the following command:

chmod +x al.oct

al.oct is now an executable...You can execute it with "./al.oct". To see where the stdin,stdout fits in (so that you can use it with Hadoop) you can try this:

>>cat al.oct|./al.oct|sort

Or in other words..."cat" the file al.oct, pipe its output to the executable script al.oct and then pipe the output of al.oct to the sort utility (this is just an example,we could have "cat" any file, but since we know that al.oct is a simple text file we just use this).

It could be of course that Octave does not support everything your MATLAB code is trying to call, but this could be an alternative way to using Hadoop Streaming without losing the convenience / power of higher level code.

A_A
  • 2,326
  • 18
  • 27
  • But I think I need to install octave in every worker node of the hadoop cluster for doing this, am I right ? Isn't there a way to create a .out file from an octave code? That would be very helpful.. – Harsh Apr 05 '12 at 12:47
  • Yes, that would be the case :-/ At the moment it is impossible to compile Octave or even transliterate octave code to something executable. Please see [this](http://stackoverflow.com/questions/5101219/how-do-i-convert-octave-code-to-c-or-c) and [this](http://octave.1599824.n4.nabble.com/octave-to-independent-C-code-td1630298.html) (i haven't followed up the second for a long time). You need some kind of control over your workers...Maybe your service's admins can help with getting the most viable of the above options to work (?). – A_A Apr 05 '12 at 13:16
  • Thanks for such a great answer. Do you think now, after 5 years, better integration solutions may emergy between Matlab and Hadoop? – Mohammad nagdawi Feb 11 '18 at 04:01
  • @Mohammad_nagdawi Certainly, a more recent answer further below seems to indicate that. But this "trick" will probably always work as it only requires an executable. All the best with your project. – A_A Feb 11 '18 at 04:06
2

Does not the nature of the algorithm to be converted matter? If the MATLAB/Octave code is tightly coupled, spreading it out over a map-reduced may yield horrible behavior.

2

With respect to your first option: The Matlab Coder now supports many image processing functions (partly via system objects) to automatically generate C-code of your algorithm, which is basically platform independent and needs no runtime environment. From my experience this code is about a factor 2..3 slower than "hand-coded" OpenCV (strongly depends on your algorithm and cpu). The main drawback is, you need a Matlab Coder license ($$$).

kirk
  • 125
  • 1
  • 5
0

Most of the answers here seem to be pre MATLAB R2014b.

In R2014b, MATLAB allows mapreduce from within MATLAB and integration with Hadoop.

I cannot be certain about your specific use case but you may want to check:

http://www.mathworks.com/help/matlab/mapreduce.html

http://www.mathworks.com/discovery/matlab-mapreduce-hadoop.html

Aaditya Kalsi
  • 1,039
  • 8
  • 16