0

I am new to Pentaho and Spoon and I am trying to process a file from a local Hadoop node with a "Hadoop file input" item in Spoon (Pentaho). The problem is that every URI I have tried so far seems to be incorrect. I don't know how to really connect to the HDFS from Pentaho.

To make it clear, the correct URI is:

hdfs://localhost:9001/user/data/prueba_concepto/ListadoProductos_2017_02_13-15_59_con_id.csv

I know it's the correct one because I tested it via command-line and it perfectly works:

hdfs dfs -ls hdfs://localhost:9001/user/data/prueba_concepto/ListadoProductos_2017_02_13-15_59_con_id.csv 

So, setting the environment field to "static", here are some of the URIs I have tried in Spoon:

  • hdfs://localhost:9001/user/data/prueba_concepto/ListadoProductos_2017_02_13-15_59_con_id.csv
  • hdfs://localhost:8020/user/data/prueba_concepto/ListadoProductos_2017_02_13-15_59_con_id.csv
  • hdfs://localhost:9001
  • hdfs://localhost:9001/user/data/prueba_concepto/
  • hdfs://localhost:9001/user/data/prueba_concepto
  • hdfs:///

I even tried the solution Garci GarcĂ­a gives here: Pentaho Hadoop File Input which is setting the port to 8020 and use the following uri:

  • hdfs://catalin:@localhost:8020/user/data/prueba_concepto/ListadoProductos_2017_02_13-15_59_con_id.csv

And then changed it back to 9001 and tried the same technique:

  • hdfs://catalin:@localhost:9001/user/data/prueba_concepto/ListadoProductos_2017_02_13-15_59_con_id.csv

But still nothing worked for me ... everytime I press Mostrar Fichero(s)... button (Show file(s)), an error pops saying that that file cannot be found.

I added a "Hadoop File Input" image here.

Thank you.

Community
  • 1
  • 1

1 Answers1

0

Okey, so I actually solved this.

I had to add a new Hadoop Cluster from the tab "View" -> Right click on Hadoop Cluster -> New

There I had to input my HDFS Hadoop configuration:

  • Storage: HDFS
  • Hostname: localhost
  • Port: 9001 (by default is 8020)
  • Username: catalin
  • Password: (no password)

After that, if you hit the "Test" button, some of the tests will fail. I solved the second one by copying all the configuration properties I had in my LOCAL Hadoop configuration file ($LOCAL_HADOOP_HOME/etc/hadoop/core-site.xml) into the spoon's hadoop configuration file:

data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/hdp25/core-site.xml

After that, I had to modify the data-integration/plugins/pentaho-big-data-plugin/plugin.properties and set the property "active.hadoop.configuration" to hdp25:

active.hadoop.configuration=hdp25

Restart spoon and you're good to go.