I am new to Pentaho and Spoon and I am trying to process a file from a local Hadoop node with a "Hadoop file input" item in Spoon (Pentaho). The problem is that every URI I have tried so far seems to be incorrect. I don't know how to really connect to the HDFS from Pentaho.
To make it clear, the correct URI is:
hdfs://localhost:9001/user/data/prueba_concepto/ListadoProductos_2017_02_13-15_59_con_id.csv
I know it's the correct one because I tested it via command-line and it perfectly works:
hdfs dfs -ls hdfs://localhost:9001/user/data/prueba_concepto/ListadoProductos_2017_02_13-15_59_con_id.csv
So, setting the environment field to "static", here are some of the URIs I have tried in Spoon:
- hdfs://localhost:9001/user/data/prueba_concepto/ListadoProductos_2017_02_13-15_59_con_id.csv
- hdfs://localhost:8020/user/data/prueba_concepto/ListadoProductos_2017_02_13-15_59_con_id.csv
- hdfs://localhost:9001
- hdfs://localhost:9001/user/data/prueba_concepto/
- hdfs://localhost:9001/user/data/prueba_concepto
- hdfs:///
I even tried the solution Garci GarcĂa gives here: Pentaho Hadoop File Input which is setting the port to 8020 and use the following uri:
- hdfs://catalin:@localhost:8020/user/data/prueba_concepto/ListadoProductos_2017_02_13-15_59_con_id.csv
And then changed it back to 9001 and tried the same technique:
- hdfs://catalin:@localhost:9001/user/data/prueba_concepto/ListadoProductos_2017_02_13-15_59_con_id.csv
But still nothing worked for me ... everytime I press Mostrar Fichero(s)... button (Show file(s)), an error pops saying that that file cannot be found.
I added a "Hadoop File Input" image here.
Thank you.