2

I have successfully configured Hadoop 2.4 in an Ubuntu 14.04 VM from a Windows 8 system. Hadoop installation is working absolutely fine and also i am able to view the Namenode from my windows browser. Attached Image Below:

enter image description here

So, my host name is : ubuntu and hdfs port : 9000 (correct me if I am wrong).

Core-site.xml :

<property>
  <name>fs.defaultFS</name>
  <value>hdfs://ubuntu:9000</value>
</property>

The issue is while connecting to HDFS from my Pentaho Data Integration Tool. Attached Image Below. PDI version: 4.4.0 Step Used: Hadoop Copy Files

enter image description here

Please kindly help me in connecting to HDFS using PDI. Do i need to install or update any jar for this ?? Please let me know in case you need more information.

Rishu Shrivastava
  • 3,745
  • 1
  • 20
  • 41

1 Answers1

3

PDI 4.4 afaik doesn't have support for Hadoop 2.4. In any case, there is a property in a file you must set to use a particular Hadoop configuration (you may see "Hadoop configuration" referred to as a "shim" in the forums, etc.). In the data-integration/plugins/pentaho-big-data-plugin/plugin.properties file there is a property called active.hadoop.configuration, it is set by default to "hadoop-20" which refers to an Apache Hadoop 0.20.x distribution. You will want to set it to the "newest" distro that comes with Pentaho, or build your own shim as described in my blog post:

http://funpdi.blogspot.com/2013/03/pentaho-data-integration-44-and-hadoop.html

Upcoming versions (5.2+) of PDI will support vendor distributions that include Hadoop 2.4+, so keep your eye out on the PDI Marketplace and on pentaho.com :)

mattyb
  • 11,693
  • 15
  • 20
  • Ohkk got your point Matt. Thanks !! will def. try to change the configuration. – Rishu Shrivastava Jul 31 '14 at 12:49
  • Hi Matt, I am having the same issues, but it seems that in PDI version v8.1, they have not put in a default shim support for Hadoop 2.7.*. – Palu Sep 12 '18 at 00:14
  • In terms of your solution, for the editing of the active.hadoop.configuration, how would one write hadoop version 2.7.4, because hadoop 0.20 is written as "hadoop-20", as you indicated in the solution, then would hadoop version 2.7.4 be "hadoop-274"? So what is the naming convention here. – Palu Sep 12 '18 at 00:17
  • I haven't kept up with PDI versions or which shims they have. If they have a 2.7.x shim it should be where the others are, not sure what the naming convention is these days. Palu notes there is no shim support in 8.1 for 2.7.x. – mattyb Sep 12 '18 at 16:26