0

I have a bit of a situation with an xml file ment to be open with Excel. A client is sending me this type of xml:

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:html="http://www.w3.org/TR/REC-html40">
 <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
  <LastAuthor>Microsoft Office User</LastAuthor>
  <Created>1996-10-14T23:33:28Z</Created>
  <LastSaved>2021-03-10T13:02:44Z</LastSaved>
  <Version>16.00</Version>
 </DocumentProperties>
 <OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office">
  <AllowPNG/>
 </OfficeDocumentSettings>
 <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
  <WindowHeight>15940</WindowHeight>
  <WindowWidth>28040</WindowWidth>
  <WindowTopX>380</WindowTopX>
  <WindowTopY>500</WindowTopY>
  <ProtectStructure>False</ProtectStructure>
  <ProtectWindows>False</ProtectWindows>
 </ExcelWorkbook>
 <Styles>
  <Style ss:ID="Default" ss:Name="Normal">
   <Alignment ss:Vertical="Bottom"/>
   <Borders/>
   <Font ss:FontName="Calibri" x:Family="Swiss" ss:Size="12" ss:Color="#000000"/>
   <Interior/>
   <NumberFormat/>
   <Protection/>
  </Style>
  <Style ss:ID="s62">
   <Alignment ss:Vertical="Bottom"/>
   <Borders/>
   <Font ss:FontName="Arial"/>
   <Interior/>
   <NumberFormat/>
   <Protection/>
  </Style>
  <Style ss:ID="s63">
   <Alignment ss:Vertical="Bottom"/>
   <Borders/>
   <Font ss:FontName="Trebuchet MS" ss:Color="#000000" ss:Bold="1"/>
   <Interior ss:Color="#D3D3D3" ss:Pattern="Solid"/>
   <NumberFormat/>
   <Protection/>
  </Style>
  <Style ss:ID="s64">
   <Alignment ss:Vertical="Bottom"/>
   <Borders/>
   <Font ss:FontName="Trebuchet MS" ss:Color="#000000"/>
   <Interior/>
   <NumberFormat/>
   <Protection/>
  </Style>
  <Style ss:ID="s65">
   <Alignment ss:Vertical="Bottom"/>
   <Borders/>
   <Font ss:FontName="Trebuchet MS" ss:Color="#993300" ss:Bold="1"/>
   <Interior ss:Color="#FFFF00" ss:Pattern="Solid"/>
   <NumberFormat/>
   <Protection/>
  </Style>
 </Styles>
 <Worksheet ss:Name="SITES_INFO">
  <Table ss:ExpandedColumnCount="36" ss:ExpandedRowCount="2" x:FullColumns="1"
   x:FullRows="1" ss:StyleID="s62" ss:DefaultColumnWidth="65"
   ss:DefaultRowHeight="13">
   <Column ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="75" ss:Span="4"/>
   <Column ss:Index="6" ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="175"/>
   <Column ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="90"/>
   <Column ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="225"/>
   <Column ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="200"/>
   <Column ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="225"/>
   <Column ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="100" ss:Span="1"/>
   <Column ss:Index="13" ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="125"/>
   <Column ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="100"/>
   <Column ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="175"/>
   <Column ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="125" ss:Span="2"/>
   <Column ss:Index="19" ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="175"/>
   <Column ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="160"/>
   <Column ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="150"/>
   <Column ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="160" ss:Span="1"/>
   <Column ss:Index="24" ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="250"/>
   <Column ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="200" ss:Span="1"/>
   <Column ss:Index="27" ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="160"
    ss:Span="1"/>
   <Column ss:Index="29" ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="175"/>
   <Column ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="160"/>
   <Column ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="215"/>
   <Column ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="175"/>
   <Column ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="160" ss:Span="1"/>
   <Column ss:Index="35" ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="200"/>
   <Column ss:StyleID="s62" ss:AutoFitWidth="0" ss:Width="100"/>
   <Row ss:AutoFitHeight="0" ss:Height="20">
    <Cell ss:StyleID="s63"><Data ss:Type="String">Protocol</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Site Number</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">PI Salutation</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">PI Last Name</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">PI First Name</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">PI Email</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">PI Phone Number</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Drug Shipment Name</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Drug Shipment Address 1</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Drug Shipment Address 2</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Drug Shipment City</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Drug Shipment State</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Drug Shipment Province</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Drug Shipment Zip</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Drug Shipment Country ISO Code</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Drug Shipment Country</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Drug Shipment Phone</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Drug Shipment FAX</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Drug Shipment Contact First Name</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Drug Shipment Contact Last Name</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Drug Shipment Contact Email</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Drug Shipment Contact Phone</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Drug Shipment Contact Role</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Patient Treatment Center Name</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Patient Treatment Center Address 1</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Patient Treatment Center Address 2</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Patient Treatment Center City</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Patient Treatment Center State</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Patient Treatment Center Province</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Patient Treatment Center Zip</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Patient Treatment Center Country ISO Code</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Patient Treatment Center Country</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Patient Treatment Center Phone</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Patient Treatment Center Fax</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Patient Treatment Center TimeZone</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Transaction Flag</Data></Cell>
   </Row>
   <Row>
    <Cell ss:StyleID="s64"><Data ss:Type="String">fgregre</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">0382</Data></Cell>
    <Cell ss:StyleID="s64"/>
    <Cell ss:StyleID="s64"><Data ss:Type="String">gregreger</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">gergre</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">vregregerge</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">ergreger</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">regreger</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">8 Haaliya st</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">vervregerge</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">Haifa</Data></Cell>
    <Cell ss:StyleID="s64"/>
    <Cell ss:StyleID="s64"><Data ss:Type="String">N/A</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">gergere</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">ISR</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">ISRAEL</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">gergereg</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">gergreger</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">regerger</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">regerge</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">rgergerg</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">ergerge</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">ergerge</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">Rambam Medical Center</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">regergerge</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">Hematology Department</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">ergergre</Data></Cell>
    <Cell ss:StyleID="s64"/>
    <Cell ss:StyleID="s65"><Data ss:Type="String">N/A</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">ergreegre</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">ISR</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">ergerger</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">regergreger</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">regregerge</Data></Cell>
    <Cell ss:StyleID="s64"><Data ss:Type="String">(GMT+02:00) Jerusalem</Data></Cell>
    <Cell ss:StyleID="s65"><Data ss:Type="String">UPDATE</Data></Cell>
   </Row>
  </Table>
  <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
   <Selected/>
   <LeftColumnVisible>28</LeftColumnVisible>
   <Panes>
    <Pane>
     <Number>3</Number>
     <ActiveRow>5</ActiveRow>
     <ActiveCol>34</ActiveCol>
    </Pane>
   </Panes>
   <ProtectObjects>False</ProtectObjects>
   <ProtectScenarios>False</ProtectScenarios>
  </WorksheetOptions>
 </Worksheet>
 <Worksheet ss:Name="USER_INFO">
  <Table ss:ExpandedColumnCount="1" ss:ExpandedRowCount="1" x:FullColumns="1"
   x:FullRows="1" ss:StyleID="s62" ss:DefaultColumnWidth="65"
   ss:DefaultRowHeight="13">
  </Table>
  <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
   <Panes>
    <Pane>
     <Number>3</Number>
     <ActiveRow>22</ActiveRow>
     <ActiveCol>7</ActiveCol>
    </Pane>
   </Panes>
   <ProtectObjects>False</ProtectObjects>
   <ProtectScenarios>False</ProtectScenarios>
  </WorksheetOptions>
 </Worksheet>
</Workbook>

I am receiving this file via sftp and i am reading it as a string, obtaining it's content. I tried using Apache POI thinking i would be able to read it as an xls file and map it to a class.

This how I am trying to read it:

   private void processFileData(String stringData, String filename, String extension)
    {
        InputStream inp = IOUtils.toInputStream(stringData, StandardCharsets.UTF_8);
        List<MULSite> siteList = new ArrayList<MULSite>();

        try {
            inp = FileMagic.prepareToCheckMagic(inp);
            Workbook workbook = WorkbookFactory.create(inp);
            Iterator<Row> iterator = workbook.getSheet("SITES_INFO").iterator();
            while (iterator.hasNext())
            {
                MULSite site = new MULSite();
                site.parse(iterator.next(),"xls", filename);
                DtoValidatorResult isValid = site.validate();
                if(isValid.getValid())
                {
                    splittableDtoMap.put(site.getSiteNumber(), site.toString());
                }
                else {
                    throw new RetrieverException("Issue encountered at " + filename);
                }
            }
        } catch (IOException | RetrieverException | JAXBException e) {
            e.printStackTrace();
        }
    }

However when it reaches: Workbook workbook = WorkbookFactory.create(inp); it throws the following error: java.lang.IllegalArgumentException: Your InputStream was neither an OLE2 stream, nor an OOXML stream

I am a bit stuck on how to treat this type of file, i tried changing the extension from .xml to .xls, Excel can open it perfectly both ways but Apache POI wont accept it, I am not sure if the problem is with the file or how I read it with InputStream.

ACristian24
  • 343
  • 2
  • 5
  • 14
  • 1
    Not sure, but I think [this](https://stackoverflow.com/questions/29236294/getting-error-your-inputstream-was-neither-an-ole2-stream-nor-an-ooxml-stream) answer your question – Laurent Morissette Mar 10 '21 at 13:36

1 Answers1

2

Sadly this is not possible, Apache POI does not support Office XML, it was never implemented in the library as this was deprecated by Microsoft rather fast.

I have given up trying to do this with Apache POI and will pars it with an XML parsers as DOM.

ACristian24
  • 343
  • 2
  • 5
  • 14
  • [Apache Tika however does have a parser, which turns these into XHTML](https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/xml/SpreadsheetMLParser.java) which could be a good basis – Gagravarr Mar 11 '21 at 18:07
  • I will give it a try and see how it works with Apache Tika, for now i resumed at using DOM Xml parsing as it was the fastest way to make a working prototype. – ACristian24 Mar 12 '21 at 12:18