I am using MAPI tools (Its microsoft lib and in .NET) and then apache TIKA libraries to process and extract the pst from exchange server, which is not scalable.
How can I process/extracts pst using MR way ... Is there any tool, library available in java which I can use in my MR jobs. Any help would be great-full .
Jpst Lib internally uses: PstFile pstFile = new PstFile(java.io.File)
And the problem is for Hadoop API's we don't have anything close to java.io.File
.
Following option is always there but not efficient:
File tempFile = File.createTempFile("myfile", ".tmp");
fs.moveToLocalFile(new Path (<HDFS pst path>) , new Path(tempFile.getAbsolutePath()) );
PstFile pstFile = new PstFile(tempFile);