1

My main goal is to parse emails from a spreadsheet. What I need is to detect if a cell is containing anything other than an Email (wrong format, boolean, number...) to add this cell to an array (assume invalids)

What I did:

using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false))
        {
            WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
            WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
            Cell cell = new Cell();
            OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);

            Sheet sheet = workbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == sheetName).FirstOrDefault();

            worksheetPart = (WorksheetPart)(workbookPart.GetPartById(sheet.Id));

            while (reader.Read())
            {
                if (reader.ElementType == typeof(CellValue))
                {
                    cell = worksheetPart.Worksheet.Descendants<Cell>().Where(c => c.CellReference == String.Format("A{0}", reader.GetText())).FirstOrDefault();

                    if (cell != null && cell.InnerText.Length > 0)
                    {
                        var value = cell.InnerText;

                        if (cell.DataType != null)
                        {
                            var stringTable = workbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();

                            if (cell.DataType.Value == CellValues.SharedString)
                            {
                                value = stringTable.SharedStringTable.ElementAt(int.Parse(value)).InnerText;
                                values.Add(value);
                            }
                            else
                            {
                                // add the invalid to the array
                            }
                        }
                    }
                }
            }
            spreadsheetDocument.Close();
        }

Forget about email validation, my problem is that I want to extract non-string values. I have tried lots of things but the code is not reaching the else where it suppose to extract the numbers while there is booleans and numbers in the spreadsheet. Where is the problem? How to extract those non-string values?

  • 3
    Honestly use EEPlus or something similar, someone has done the hardwork for you. It's straightforward to use and takes the hassle out of what you are attempting to do. I have used EEPlus a number of times, it takes out the grunt work. You can use Linq to query the worksheet. Have a look at https://stackoverflow.com/questions/56312166/find-and-replace-all-string-using-epplus to get an idea. – Richard Jan 09 '20 at 09:50
  • @Richard does it work on linux? Im deploying on an ubuntu machine – Gwasti Grondo Jan 09 '20 at 10:19
  • There is Epplus.Core, EPPlus.Core -Version 1.5.4. Just checked on Github it should: https://github.com/JanKallman/EPPlus/issues/224 – Richard Jan 09 '20 at 10:46
  • thank you! much appreciated – Gwasti Grondo Jan 09 '20 at 10:58
  • It might help to look at the internal Excel Open XML, using the Open XML SDK Productivity Tool. Perhaps all the content has been stored as a "Shared String" so that you'd need to check whether what's coming from there can be converted to a number, or not... – Cindy Meister Jan 09 '20 at 13:38

1 Answers1

0

I used EPPLUS Core which is a good wrapper for OpenXML