0

I need to run following command 'hdfs dfs -cat /user/username/data/20220815/EDHSB.CSV', which shows the contents of the CSV file (present in remote HDFS).

To implement the above I have used below code:

try{
    String shpath="hdfs dfs -cat /user/username/data/20220815/EDHSB.CSV";
    Process ps = Runtime.getRuntime().exec(shpath);  
    ps.waitFor();  
    }
    catch (Exception e) {  
    e.printStackTrace();  
    }  

Next step is to read the CSV file from above code. Is the first step good enough or is there any other way for the entire flow...

Raghu K
  • 57
  • 1
  • 10

2 Answers2

0

You should use java.lang.Process and java.lang.ProcessBuilder instead, as that allows you to intercept the output directly in your Java code.

Basically, it looks like this

final var process = new ProcessBuilder( "hdfs", "dfs", "-cat", "/user/username/data/20220815/EDHSB.CSV" )
  .start();
final String csvFileContents; 
try( var inputStream = process.getInputStream();
  var reader = new BufferedReader( new InputStreamReader( inputStream ) )
{
  csvFileContents = lines.collect( Collectors.joining( "\n" ) );
}

All necessary error handling was omitted for readability …

tquadrat
  • 3,033
  • 1
  • 16
  • 29
  • This is not going to work because `ProcessBuilder` takes individual arguments instead of a whole line, so will try to look for a single executable file called "hdfs dfs -cat /user/username/data/20220815/EDHSB.CSV". On the other hand, `Runtime.exec` works ok for this. It also returns a `Process`, just like `ProcessBuilder.start()` does. – k314159 Aug 15 '22 at 13:34
  • 1
    @k314159 – You're right; fixed it … – tquadrat Aug 15 '22 at 13:43
0

Two things about your code:

  1. It's better not to call printStackTrace() because it's too easy to miss it. Do something meaningful with exceptions. If you can't, just let the exception come out of your method by adding a throws clause to its signature.
  2. Do you really want to wait for the process to finish by calling waitFor() before you start reading? If you do, and the file is very big, you might lose some content because the Java runtime has a limited buffer. Instead, get its inputstream and start process it straight away. You'll get an EOF condition when the process exits.
void processCSV() throws IOException {
    String shpath="hdfs dfs -cat /user/username/data/20220815/EDHSB.CSV";
    Process ps = Runtime.getRuntime().exec(shpath);
    try (Stream<String> lines = ps.inputReader().lines()) {
        lines.forEach(line -> {
            processCSVLine(line);
        }
    }
}
k314159
  • 5,051
  • 10
  • 32