0

What I do: populate & format an Excel file using a mix of Interop and ClosedXML.

First, the file is populated via Interop, then saved, closed, then I format the cells' RichText using ClosedXML.

Unfortunately, this formatting causes Excel to view my file as "corrupt" and needs to repair it. This is the relevant part:

var workbook = new XLWorkbook(xlsPath);
var sheet = workbook.Worksheet("Error Log");
for (var rownum = 2; rownum <= 10000; rownum++)
{
    var oldcell = sheet.Cell("C" + rownum);
    var newcell = sheet.Cell("D" + rownum);
    var oldtext = oldcell.GetFormattedString();
    if(string.IsNullOrEmpty(oldtext.Trim()))
        break;
    XlHelper.ColorCellText(oldcell, "del", System.Drawing.Color.Red);
    XlHelper.ColorCellText(newcell, "add", System.Drawing.Color.Green);
}
workbook.Save();

And the colouring method:

public static void ColorCellText(IXLCell cel, string tagName, System.Drawing.Color col)
{
    var rex = new Regex("\\<g\\sid\\=[\\sa-z0-9\\.\\:\\=\\\"]+?\\>");
    var txt = cel.GetFormattedString();
    var mc = rex.Matches(txt);
    var xlcol = XLColor.FromColor(col);

    foreach (Match m in mc)
    {
        txt = txt.Replace(m.Value, "");
        txt = txt.Replace("</g>", "");
    }

    var startTag = string.Format("[{0}]", tagName);
    var endTag = string.Format("[/{0}]", tagName);

    var crt = cel.RichText;
    crt.ClearText();
    while (txt.Contains(startTag) || txt.Contains(endTag))
    {
        var pos1 = txt.IndexOf(startTag);
        if (pos1 == -1)
            pos1 = 0;
        var pos2 = txt.IndexOf(endTag);
        if (pos2 == -1)
            pos2 = txt.Length - 1;

        var txtLen = pos2 - pos1 - 5;
        crt.AddText(txt.Substring(0, pos1));
        crt.AddText(txt.Substring(pos1 + 5, txtLen)).SetFontColor(xlcol);
        txt = txt.Substring(pos2 + 6);
    }
    if (!string.IsNullOrEmpty(txt))
        crt.AddText(txt);
}

Error in file myfile.xlsx
The following repairs were performed: _x000d__x000a__x000d__x000a_
Repaired records:
string properties of /xl/sharedStrings.xml-Part (strings)

I've been through all the xmls looking for clues. In the affected sheet, in comparison view of Productivity Tool, some blocks appear as inserted in the repaired file and deleted in the corrupt one, although nothing significant seemed changed - except for one thing: the style attribute of that cell. Here an example:

   <x:c r="AA2" s="59">
    <x:f>
     (IFERROR(VLOOKUP(G2,Legende!$A$42:$B$45,2,FALSE),0))
    </x:f>
   </x:c>

I have checked the styles.xml for style 59, but there is none. In the repaired file, this style has been changed to 14, which in my styles.xml is listed as a number format.

Unfortunately, a global search/replace of these invalid style indexes did not resolve the issue. Seeing the things going on here with corrupt indexes, renamed xmls, invalid named ranges etc., I took a different route: not to use interop at all, maybe the corruption was caused by Excel in the first place and the coloring was only the last straw.

Using ClosedXml only:

Wow. Just wow. This makes it even worse. I commented out the colouring part since without that, Interop produced a readable file without errors, so that's what I expect of ClosedXml too.

This is how I open the file and address the worksheet with ClosedXml:

var wb= new XLWorkbook(xlsPath);
var errors = wb.Worksheet("Error Log");

This is how I write the values into the file:

errors.Cell(zeile, 1).SetValue(fname);

With zeile being a simple int counter.

I then dare to set a column width:

errors.Column(2).Width = 50;
errors.Column(3).Width = 50;
errors.Column(4).Width = 50;

As well as setting some values in another sheet in exactly the same fashion before saving with validation.

wb.Save(true);
wb.Dispose();

Lo and behold: The validation throws errors:

Attribute 'name' should have unique value. Its current value 'Legende duplicates with others.
Attribute 'sheetId' should have unique value. Its current value '4' duplicates with others.

A couple more errors like attribute 'top' having invalid value '11.425781'.

Excel cannot open the file directly, must repair it. My Sheet "Legende" is now empty and the first sheet instead of third, and I get an additional fourth sheet "Restored_Table1" which contains my original "Legende" contents.

What the hell is going on with this file??

New attempt: re-create the Excel template from scratch - in LibreOffice.

I now think that the issue is entirely misleading. If I use the newly created file from LibreOffice, the validation causes a System.OutOfMemory exception due to too many validation errors. Opening in Excel requires repair, gives additional sheet and so forth.

Creating in LibreOffice, then opening in Excel, saving, then using that file as template produces a much better result albeit not perfect yet. Since I copied parts over from the old Excel file into LO while creating the new file, I assume some corrupt remnant got copied over.

I cannot shake the feeling that this is the file itself after all and has nothing to do with how I edit it!

Will post updaate tomorrow.

LocEngineer
  • 2,847
  • 1
  • 16
  • 28
  • 3
    Maybe you take the problem the wrong way. Instead of repairing the file, why not correcting the openxml part that create this corruption ? I assume your 'XlHelper.ColorCellText()' creates a not well formatted openxml file. After applying your function, rename it ".zip" and open the document file to see the result of your modification and what you should add / remove to correct the openxml – Maxime Porté Dec 01 '16 at 09:54
  • @MaximePorté Been there done that got the t-shirt. Open for suggestions, but doubt there is much fancy stuff going on. – LocEngineer Dec 01 '16 at 12:03
  • @MaximePorté Chanched to EPPlus, doesn't change a darned thing. Code posted. – LocEngineer Dec 04 '16 at 15:35
  • Are you able to provide a copy of the sharedStrings.xml file or better still the entire Excel file that has the errors? – petelids Dec 08 '16 at 22:32
  • @petelids Unfortunately not. Confidential contents. Added an anonymized excerpt of the sharedStrings.xml to my post. – LocEngineer Dec 09 '16 at 11:15
  • 1
    Have you tried letting excel repair the file, and then use the "Open XML SDK 2.5 Productivity Tool" to compare the damaged and the repaired file? Working with XLSX files is tricky, it is a pretty complicated format, and just minor errors result in the file beeing "corrupted". I see you are changing the text contents of cells in your `ColorCellText()` method. Have you ensured that these cells are not referring to a specific index in the shared strings table? – bassfader Dec 09 '16 at 11:55
  • @bassfader Thanks for that tip! Updated my post with findings from Productivity Tool. Now I'm even more confused than before. It found 33 errors in the corrupt file, all "invalid attribute errors" for width, footer etc., while in the repaired file it found 34 errors, all of the exact same nature. So I think the real culprit is in one of the things I posted above. – LocEngineer Dec 10 '16 at 15:49
  • Why aren't you using only one library to populate and format the file? Mixing interop, OpenXML ClosedXML and EPPlus is bound to make some hidden error pop out. If you reproduce the entire process in ClosedXML and still find an error, please log an issue on ClosedXML 's github repo. – Francois Botha Dec 11 '16 at 07:55
  • @FrancoisBotha I am not mixing OpenXML, ClosedXML and EPPlus. Only Interop + 1 of the others. Cannot use Interop exclusively because of limits of Interop with cell lengths. Not using any of the others exclusively because a) they are a pain in the neck and b) I don't fully trust them yet because of exactly such problems as you see above. I don't think I am overtaxing ClosedXML (latest approach) with coloring cell text, yet that alone produces corruption. Go figure. – LocEngineer Dec 11 '16 at 12:33
  • If you report the issue with the code to reproduce and attach the post interop file, I'll have a look at it. By the way, I assume you're using the latest version of ClosedXML. – Francois Botha Dec 11 '16 at 16:17
  • @FrancoisBotha Sorry mate, no can do. This is a company file. I'll go as far as possible but this would be too far. – LocEngineer Dec 12 '16 at 10:05
  • Found a few invalid style indexes in sheet2.xml. No cigar yet. – LocEngineer Dec 12 '16 at 13:14
  • @LocEngineer Try to reproduce the problem with another file, with dummy data then. I can't help you without a reproducible test case. – Francois Botha Dec 12 '16 at 13:32

1 Answers1

1

OK. Stuff this. I created a completely fresh file with LibreOffice, making sure not to copy over anything at all from the original file, and I ditched Interop in favour of ClosedXml.

=> This produced a corrupt file in which my first sheet was cleared and its contents move to a "Restored_Table1".

After I opened my fresh new template with Excel via Open/Repair and saved it, the resulting, uncoloured file was NOT corrupt.

=> Colouring it produces the "original" corruption, all sheets intact.

ClosedXml seems to be marginally slower than Interop but at this point I couldn't care less. I guess we will have to live with the "corrupt" message and just get on with it.

I hate xlsx.

LocEngineer
  • 2,847
  • 1
  • 16
  • 28