-1

I am using OpenXML in order to export some translation data to an Excel spreadsheet and then importing the same spreadsheet back into my program.

Now this works perfectly fine if I just export and import straight away, however, if I open op the spreadsheet in excel and edit it or just save it, my program is unable to read the data as if it was corrupted. However, Excel views it just fine if I open it again, so the file isn't actually corrupt.

Exporting the data

using (SpreadsheetDocument document = SpreadsheetDocument.Create(saveLocation, SpreadsheetDocumentType.Workbook))
...

//Add a WorkbookPart to the document
WorkbookPart workbookPart = document.AddWorkbookPart();
workbookPart.Workbook = new Workbook();

//Add a WorksheetPart to the WorkbookPart
WorksheetPart worksheetPart = workbookPart.AddNewPart<WorksheetPart>();
worksheetPart.Worksheet = new Worksheet();

//Add style
WorkbookStylesPart stylePart = workbookPart.AddNewPart<WorkbookStylesPart>();
stylePart.Stylesheet = GenerateStylesheet();
stylePart.Stylesheet.Save();
...

SheetData sheetData = worksheetPart.Worksheet.AppendChild(new SheetData());
//Append some rows
...

worksheetPart.Worksheet.Save();

Reading the spreadsheet

using (var fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
    {
        using (SpreadsheetDocument document = SpreadsheetDocument.Open(fileStream, false))
        {
            WorkbookPart workbookPart = document.WorkbookPart;
            WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
            SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();

            var headerRow = sheetData.Elements<Row>().FirstOrDefault();
            ...

            //All data here are numbers

So initially I thought maybe more worksheets are created by excel and I am simply wrong by assuming that .First() would yield me the correct one, but viewing the object while debugging indicates that the data is somehow correct. So obviously something is wrong here, so I decided to view the archived content of the .xlsx file.

This is right after my program has exported and created the file. You can see that there are some actual data (strings) in the xml file.

enter image description here

And this is after I opened the spreadsheet in Excel and just saved it without making any edits at all.

enter image description here

I don't know why this happens of if it's intended behaviour somehow. I found OpenXML to be full of weird design choices, but after making my way through that maze this is the first thing that really stopped my progress.

I am building on Netcore 2.0 on macOS and obviously Excel is also running on macOS.

If anyone could help me solve this I would be very happy.

Ezzy
  • 1,423
  • 2
  • 15
  • 32
  • 2
    Excel favors using a [SharedStringTable](https://learn.microsoft.com/en-us/office/open-xml/working-with-the-shared-string-table) instead of storing each string in a cell. This is by design and it will rewrite the file in its preferred format on save. – rene Sep 18 '18 at 18:50

1 Answers1

0

There are two ways of storing strings in an Excel file; using an inline string as you have done in your creation or, as rene pointed out in the comments, using a separate XML file called the SharedStringTable. Excel happens to use the latter approach so insted of each cell containing the string directly they instead contain an index to an entry in the SharedStringTable.

Fortunately you can read the SharedStringTable using OpenXML much like you can any other part of the document. For example:

using (var fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
    using (SpreadsheetDocument document = SpreadsheetDocument.Open(fileStream, false))
    {
        WorkbookPart workbookPart = document.WorkbookPart;
        WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
        SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();

        SharedStringTablePart stringTable = workbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();

        var headerRow = sheetData.Elements<Row>().FirstOrDefault();

        foreach (Cell c in headerRow.Elements<Cell>())
        {
            string cellText;

            if (c.DataType == CellValues.SharedString)
            {
                //the value will be a number which is an index into the shared strings table
                int index = int.Parse(c.CellValue.InnerText);
                cellText = stringTable.SharedStringTable.ElementAt(index).InnerText;
            }
            else
            {
                //just take the value from the cell (note this won't work for some types e.g. dates)
                cellText = c.CellValue.InnerText;
            }

            Console.WriteLine(cellText);
        }

    }
}

For a large file you might want to consider caching the shared strings as discussed in this question and answer.

How can I improve the performance of retrieving values from SharedStringTable in OpenXml Excel spreadsheet tools?

petelids
  • 12,305
  • 3
  • 47
  • 57