1

I'm working migrating content from one DMS(Filenet) to another(Webcenter Content), in this process I encountered an excel file with .xls extension(with content type application/vnd.ms-excel) in Filenet.

I got the InputStream of the file and wrote(File IO Operations) it to a temp location beforing pushing it to Webcenter Content.

The problem is when I download and open the Filenet Version of the excel file F1.xls, It prompts me with unmatching file format and extension warming but still opens the file and display content.

But the version which I pushed into Webcenter Content (WCC.xls)does not behave the same way.

It prompts the same message but after ignoring the prompt it shows junk characters, and if I change the extension of WCC.xls to WCC.xlsx it display appears fine.

what can i do to identify such things at runtime, any help would be highly appreciated.

Here is the code snipped from my local

        InputStream initialStream;
        try {
            initialStream = new FileInputStream(
                      new File("C:\\Users\\xxxxxx\\Desktop\\Utils\\YYYYY\\FN1.xls"));

        FileOutputStream oStream = null;

            oStream =  new FileOutputStream("C:\\Users\\xxxxxx\\Desktop\\Utils\\YYYYY\\WCC1.xls");      
        FilenetConnectionUtil fCU = new FilenetConnectionUtil();
        fCU. writeFileToTempLocation("xx","xx",initialStream,oStream);
        oStream.flush();
        oStream.close();

    }    catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

/* Write Method*/
public void writeFileToTempLocation(String filename, String filepath, InputStream inStream,FileOutputStream oStream) throws IOException {

//  FileOutputStream oStream = new FileOutputStream(filepath);
    byte[] buffer = new byte[1024];
    int n;
    if(inStream != null) {
        while ((n = inStream.read(buffer)) != -1) {
            oStream.write(buffer);
        }
    }

    oStream.flush();    

}

Thanks, Rahul Dumpala

ᄂ ᄀ
  • 5,669
  • 6
  • 43
  • 57
  • I just googled a bit and found the [Apache POI library](https://poi.apache.org/overview.html) which is able to open MS Office documents in java. You could try to open the file with that library and if it yells at you, you know that the file is bad – vatbub Jun 15 '18 at 11:15

1 Answers1

2

As .xslx is a zip format, the first two bytes (the magic cookie) should be "PK".

Path path = Paths.get(
        "C:\\Users\\xxxxxx\\Desktop\\Utils\\YYYYY\\FN1.xls"));
// Or: Path path = file.toPath();

boolean isXlsx(Path path) {
    try (InputStream in = Files.newInputStream(path)) {
        byte[] magicCookie = new byte[2];
        return in.read(magicCookie) == 2
            && magicCookie[0] == 'P'
            && magicCookie[1] == 'K';
    } catch (IOException) {
        return false;
    }
}
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • Thanks a lot Joop Eggen, I tried that bit and it did return me true. But after writing the WCC.xls file as WCC.xlsx (based on isXlsx) the file does not open completely, it editor says its corrupt and could not open, I tried finding the MIME type of the FN1.xls file it returned me "application/octet-stream" as MIME Type, assuming the file as binary data should I do anything additional while writing the file other than just changing the extension. Code for extrating MIME Thanks Rahul D – Rahul Kumar Dumpala Jun 15 '18 at 15:04
  • You could rename the original .xls file (I hope you conserved a copy) as .zip to look into it. Mime type `application/octet-stream` is just another moniker for "binary data" (octet=byte). So you have a valid .xls openable by Excel, that is in zip format, but as .xlsx not readable - weird. Inspect the zip, maybe there is some clue. – Joop Eggen Jun 15 '18 at 15:11
  • 1
    Issue is fixed, I was re-reading the same stream that was causing data loss, but I'm able to see the contents of the .xlsx file as well now. Thanks for you hepl Joop Eggen – Rahul Kumar Dumpala Jun 18 '18 at 06:58