0

Is there any standard way in hadoop streaming to handle dependencies similar to the DistributedCache(in java MR)

Say for example i have a python module to be used in all map task. How i can achieve it?

user703555
  • 265
  • 1
  • 7
  • 14

1 Answers1

0

you can use the -file argument to specify the python module:

see http://hadoop.apache.org/docs/r0.18.3/streaming.html

you can specify multiple -file arguments if you have dependency modules and such.

Chris Fregly
  • 1,490
  • 1
  • 12
  • 8