ImportError: No module named caffe while running spark-submit

Question

While running a spark-submit on a spark standalone cluster comprised of one master and 1 worker, the caffe python module does not get imported due to error ImportError: No module named caffe

This doesn't seem to be an issue whenever I run a job locally by spark-submit --master local script.py the caffe module gets imported just fine.

The environmental variables are currently set under ~/.profile for spark and caffe and they are pointing to the PYTHONPATH.

Is ~/.profile the correct location to set these variables or perhaps a system wide configuration is needed such as adding the variables under /etc/profile.d/

please check if this is useful https://github.com/yahoo/CaffeOnSpark/wiki/GetStarted_standalone — Arunakiran Nulu, Oct 05 '16 at 20:43
Thanks Arunakiran, that example is using CaffeOnSpark a Yahoo open platform that I believe uses Scala instead of python. We have looked at that but it doesn't benefit us at the moment. — Julian Bici, Oct 05 '16 at 22:09

Arun Das · Answer 1 · 2016-10-17T02:48:07.580

Please note that the CaffeOnSpark team ported Caffe to a distributed environment backed by Hadoop and Spark. You cannot, I am 99.99% sure, use Caffe alone (without any modifications) in a Spark cluster or any distributed environment per se. (Caffe team is known to be working on this).

If you need distributed deep-learning using Caffe, please follow the building method mentioned here in https://github.com/yahoo/CaffeOnSpark/wiki/build to build CaffeOnSpark for that and use CaffeOnSpark instead of Caffe.

But, best bet will be to follow either this GetStarted_standalone wiki or GetStarted_yarn wiki to create you a distributed environment to carry out deep-learning.

Further, to add python, please go through GetStarted_python wiki.

Also, since you mentioned that you are using Ubuntu here, please use ~/.bashrc to update environment your variables. You will have to source the file after the changes: source ~/.bashrc

Thanks Arun, this is exactly what I needed to know. – Julian Bici Oct 17 '16 at 02:47 — Julian Bici, Oct 17 '16 at 02:47

ImportError: No module named caffe while running spark-submit

1 Answers1