Using spreadsheetgear, is there any way to get the "probable" data type for a column, exclusive of the header row (if one exists) and reasonably tolerant of sparse population without having to do a sample myself...is there already a way to do this?
so for example if I had an excel row like
| Customers | Sales Item | Sale Date | Contact | Quantity |
| IBM | Keyboard | 28-10-2011 | | 2 |
| MS | Mouse | 27-09-2011 | joe | 5 |
I would expect to see
String, String, DateTime, String, Numeric
EDIT
So I ended up having to sample like @Tim Anderson suggested, but I needed to handle the case of sparse data, and default to string when conflicting types in the col. (this is called in a loop that walks the cols, I can't post that as it contains some IP) DataValueType is just a local enum and rowcount is the number of rows to sample and because I am already sampling I simply ignore row 0 in case it's a header row.
private DataType GetDataTypeFromColRange(IRange range, int rowcount, int col)
{
var dtlist = GetValueTypes(range, rowcount, col).Distinct();
// If conflicting types for the col default to string.
if (dtlist.Count() != 1)
{
return new DataType(DataTypeValue.String);
}
else
{
return new DataType(dtlist.First());
}
}
private IEnumerable<DataTypeValue> GetValueTypes(IRange range, int rowcount, int col)
{
for (int i = 1; i < rowcount; i++)
{
switch (range[i, col].ValueType)
{
case SpreadsheetGear.ValueType.Text:
yield return DataTypeValue.String;
break;
case SpreadsheetGear.ValueType.Number:
if (range[i, col].NumberFormatType == NumberFormatType.Date || range[i, col].NumberFormatType == NumberFormatType.DateTime)
{
yield return DataTypeValue.Date;
}
else
{
yield return DataTypeValue.Numeric;
}
break;
case SpreadsheetGear.ValueType.Logical:
yield return DataTypeValue.Bool;
break;
default: // ignore empty or errored cells.
continue;
}
}
}
I am sure this can be further improved so please feel free to post improvements, but this does what I need for now.