1

I am currently using LinqToExcel to parse through a valid CSV file, but when I reach a particular column (which is not a date), it gets automatically casted into a datetime variable. Example:

2/010/114

will get casted to

cc = "2/10/0114 12:00:00 AM"

This is the code I am using:

var csv = new ExcelQueryFactory(filepath);

var records = from c in csv.Worksheet(0)
              let rows = new 
              {
                  cc = c[9].ToString(),
              }
              select rows;

I have also tried grabbing the results in the most raw format with the same issue:

var rawrecords = from c in csv.Worksheet()
                 select c;

Is this by design? Or am I doing something wrong? How can I ensure the original string is preserved? Seems like a bit of a weird design..

Dan
  • 231
  • 1
  • 12
  • strings can be formatted in many ways to a DateTime, and yes there is a default and yes you can set it to a different varient. – Edward Jun 15 '17 at 03:12
  • I don't want to format it to a DateTime at all. It is not a date. I just want to preserve the string the same way it looks in the CSV file. – Dan Jun 15 '17 at 03:40
  • Then why not specify the type instead of making it anonymous? New string { //code } – Edward Jun 15 '17 at 03:43
  • I tried that too. But with no result. I believe the issue lies with LinqToExcel making assumptions about the datatype – Dan Jun 15 '17 at 03:48
  • Sorry no way to test, but I image c is all columns. Why not get rid of the lets and select and break at that line to see what datatypes are actually showing for each column in records – Edward Jun 15 '17 at 03:52

1 Answers1

1

I've not used Linq2Excel before, but based on the example usage info on their GitHub I'd say the following ought to work:

var csv = new ExcelQueryFactory(filepath);

var records = from c in csv.Worksheet(0)
              let rows = new 
              {
                  cc = c[9].Cast<string>(),
              }
              select rows;

Update

After trying this code in LinqPad, I can confirm that it doesn't work, and that's because there doesn't appear to be any way of forcing it to interpret that cell as string, it looks like a DateTime so it's interpreting it that way.

The only way I can see to get it to do what you want is to en-quote those values, e.g.:

A,B
12,"2/010/114"

This does then force it to read in correctly.

On a simpler note, I'd wonder whether you really need to complexity of LinqToExcel and could just perhaps get away with reading the file yourself manually?


Example Reader

I've put together the following very simple and hacky reader:

void Main()
{
    var reader = new CsvReader();
    reader.Read(@"C:\users\clint\desktop\test.csv", 10, 5);
    reader.GetDataAtPosition(1,1).Dump();
    reader.GetDataAtPosition(2,2).Dump();
    reader.GetDataAtPosition(2,2, s => s.Split('/')).Dump();
}

// Define other methods and classes here
public class CsvReader
{
    private string[,] _data;
    
    // Take a file, and estimated col and row counts (over-inflate these if needed to ensure the file can be read)
    public void Read(string file, int cols, int rows)
    {
        _data = new string[rows,cols];
        GC.Collect(2);
        var line = 0;
        var col = 0;
        using (var stream = new FileStream(file, FileMode.Open, FileAccess.Read, FileShare.Read))
        {
            using (var reader = new StreamReader(stream))
            {
                while (!reader.EndOfStream)
                {
                    var lineIn = reader.ReadLine();
                    var inQuotes = false;
                    var thisCellRaw = "";
                    foreach (var ch in lineIn.TrimStart().TrimEnd())
                    {
                        if (ch == '"')
                        {
                            inQuotes = !inQuotes;
                            continue;
                        }

                        if (ch == ',' && !inQuotes)
                        {
                            _data[line, col] = thisCellRaw;
                            thisCellRaw = "";
                            col++;
                            continue;
                        }
                        
                        thisCellRaw += ch;
                    }
                    if (!string.IsNullOrEmpty(thisCellRaw))
                    {
                        _data[line, col] = thisCellRaw;
                    }
                    line++;
                    col = 0;
                }
            }
        }
    }

    public string GetDataAtPosition(int row, int col)
    {
        return GetDataAtPosition<string>(row,col);
    }
    
    public T GetDataAtPosition<T>(int row, int col, Func<string,T> transform = null)
    {
        row = row - 1;
        col = col - 1;
        var item = _data[row,col];
        if (item == null) throw new KeyNotFoundException("No data at that position");
        return (transform ?? ((s) => (T)Convert.ChangeType(item, typeof(T))))(item);
    }
}

It's not the most efficient, and shouldn't be used in production code without some serious cleanup and error handling, but this should help get you there; you could even use some form of inference e.g. int.TryParse to test whether something should be seen as an int etc.

Community
  • 1
  • 1
Clint
  • 6,133
  • 2
  • 27
  • 48
  • @Dan you've tried wrapping the column's value in two double quotes? I tried that and it came in fine for me. – Clint Jun 15 '17 at 03:40
  • I thought about reading it out myself, but there are some issues that might come up when there are valid commas in the records and such; I didn't want that headache. I didn't try that! But that solution won't work for me in the long run because I won't have the ability to manipulate the file.. – Dan Jun 15 '17 at 03:41
  • @Dan yeah, that'll happen. I ended up having to do it once and have a tokeniser that would know when it was in an enquoted block and ignore commas. I'd just say enquote the data you want interpreted literally as strings in the CSV file itself. With the time that takes (and whether there are more files / they're generated elsewhere), I'd say write your own thing. – Clint Jun 15 '17 at 03:43
  • What I did note, that when the file is an .xlsx, it works as expected. So weird. – Dan Jun 15 '17 at 03:44
  • @Dan I'd imagine that's because the xlsx reader has some sort of assumption built in that it should be a string / some smarts to tell it's likely not a date. – Clint Jun 15 '17 at 03:45
  • Hmm yeah, I'll write my own as a last resort - really hate reinventing the shoddy wheel. I'll see whether it's feasible to keep the files in XLSX format and parse it like that. Thanks for the help man! – Dan Jun 15 '17 at 03:46