0

Apologies beforehand if this turns out to be a silly question, I am new to hadoop environment.

I have two hadoop clusters my-prod-cluster and my-bcp-cluster. Both are accessible over the same network.

Is there any way to configure my clusters in such a way that when I am in BCP mode, all my queries to my-prod-cluster gets routed to my-bcp-cluster (on the basis of some config parameter or environment variable)

So when flag=prod
hadoop fs -ls /my-prod-cluster/mydir translates to hadoop fs -ls /my-prod-cluster/mydir
and fetches the data in /my-prod-cluster/mydir


when the flag=bcp
hadoop fs -ls /my-prod-cluster/mydir translates to hadoop fs -ls /my-bcp-cluster/mydir
and fetches data from /my-bcp-cluster/mydir



I am using [mapr][1] flavour of haddop(provided by HP), version 6.1, in case that matters
A Nice Guy
  • 2,676
  • 4
  • 30
  • 54

1 Answers1

1

You could easily make a shell wrapper script that prepends the NameNode address to each query

For example, a fully-qualified command would look like this

hdfs dfs -ls hdfs://my-prod-cluster.domain.com/path/to/mydir

So, refactoring that, you could have a script like

#!/bin/sh
if [ $1 -eq "prod" ]; then
  NAMENODE=hdfs://my-prod-cluster.domain.com
fi
# TODO: error handling and more clusters

PATH=$2
hdfs dfs -ls "${NAMENODE}${PATH}"

Then execute something like my-hdfs-ls prod /mydir


If you need something more complex than that like Kerberos tickets, and such, then creating a separate HADOOP_CONF_DIR variable with unique core-site and hdfs-site XMLs for each cluster would be recommended.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245