0

I am using CDH4 and I am trying to access GPU from cleanup() method of mapper class using JOCL. (Note: My normal code(without map reduce) works fine on GPU).

When I execute my map-reduce code, It throws an error (specified below).

attempt_201309171647_0021_m_000000_1: No protocol specified
attempt_201309171647_0021_m_000000_1: No protocol specified
13/09/20 18:03:01 INFO mapred.JobClient: Task Id : attempt_201309171647_0021_m_000000_2, Status : FAILED
org.jocl.CLException: CL_DEVICE_NOT_FOUND
    at org.jocl.CL.checkResult(CL.java:569)
    at org.jocl.CL.clGetDeviceIDs(CL.java:2239)
    at com.testMR.jocl.WordCountMapper.cleanup(WordCountMapper.java:106)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
******************************************************************************

Each map task throws an error "No protocol specified". what does this mean ? What are the protocols used in mapper class ?

Regards

1 Answers1

0

There was a problem with OS's graphic device.

We solved it :-)

The problem that we had was with the AMD OpenCL codes running on Hadoop. The MapReduce code didn’t have access to the GPU cards. It needed the GUI services that is provided by X-server to use the GPU compute resource.

From what I understand AMD OpenCL codes (for users other than root) can’t be run without access to an X-server (http://en.wikipedia.org/wiki/X_Window_System)

According to this thread http://devgurus.amd.com/thread/160838 AMD is working on getting OpenCL to work without X-server.

The solution that I found to get OpenCL codes to run on Hadoop is adapted from this thread http://devgurus.amd.com/message/1284840 that suggests steps to get OpenCL codes to run through an ssh login without a GUI.

The following are the steps that I followed:

  1. Edit the 'lightdm' user's shell using 'chsh lightdm' command and set it to /bin/bash

    $sudo chsh lightdm
    

    when it prompts, type : /bin/bash

  2. Open /etc/rc.local and add the following line before 'exit 0' .

    su -l lightdm -c "sleep 30 ; export DISPLAY=:0 ; xhost +local:"
    
  3. Create a file /etc/profile.d/compute.sh and add the following inside (and execute 'chmod 755 /etc/profile.d/compute.sh' ):

    #!/bin/sh
    
    export COMPUTE=:0
    
    #export DISPLAY=:0
    
    #export GPU_MAX_ALLOC_PRCENT=100
    
    #export GPU_MAX_HEAP_SIZE=100
    
    if [ ! -z "$DISPLAY" ]; then
    
        xhost +local:
    
    fi
    
  4. The commented out entries above are for testing other stuff if this setup didn’t work, but for us it worked

  5. Give permissions for the above script

    $sudo chmod 755 /etc/profile.d/compute.sh
    
  6. X setup resets if one logs in/out from the lightdm, so the following was added into /etc/lightdm/lightdm.conf

    greeter-show-manual-login=true
    
    greeter-setup-script=/etc/profile.d/compute.sh
    
    session-setup-script=/etc/profile.d/compute.sh
    
  7. Reboot the system so that the environment variables are set for all the users (including mapred), now we can run OpenCL codes from Hadoop

Manjunath Ballur
  • 6,287
  • 3
  • 37
  • 48