0

In hadoop streaming, is there a way to get the ID of a node handling a given task?

By way of analogy, this snippet gives the name of the input file for the task:

#!/usr/bin/env python
import os
map_input_file = str(os.environ["map_input_file"])

I'm looking for something like os.environ["map_node_id"]. Any unique handle to the node would work...

Abe
  • 22,738
  • 26
  • 82
  • 111

1 Answers1

1

You can get the datanode's hostname simply by using the socket module in your mapper/reducer:

import socket
...
node = socket.gethostname()
Lorand Bendig
  • 10,630
  • 1
  • 38
  • 45