0

Using spreadsheetgear, is there any way to get the "probable" data type for a column, exclusive of the header row (if one exists) and reasonably tolerant of sparse population without having to do a sample myself...is there already a way to do this?

so for example if I had an excel row like

| Customers | Sales Item | Sale Date  | Contact | Quantity |
| IBM       | Keyboard   | 28-10-2011 |         | 2        |
| MS        | Mouse      | 27-09-2011 | joe     | 5        |

I would expect to see

String, String, DateTime, String, Numeric

EDIT

So I ended up having to sample like @Tim Anderson suggested, but I needed to handle the case of sparse data, and default to string when conflicting types in the col. (this is called in a loop that walks the cols, I can't post that as it contains some IP) DataValueType is just a local enum and rowcount is the number of rows to sample and because I am already sampling I simply ignore row 0 in case it's a header row.

private DataType GetDataTypeFromColRange(IRange range, int rowcount, int col)
{
    var dtlist = GetValueTypes(range, rowcount, col).Distinct();
    // If conflicting types for the col default to string.
    if (dtlist.Count() != 1)
    {
        return new DataType(DataTypeValue.String);
    }
    else
    {
        return new DataType(dtlist.First());
    }
}

private IEnumerable<DataTypeValue> GetValueTypes(IRange range, int rowcount, int col)
{
    for (int i = 1; i < rowcount; i++)
    {
        switch (range[i, col].ValueType)
        {
            case SpreadsheetGear.ValueType.Text:
                yield return DataTypeValue.String;
                break;
            case SpreadsheetGear.ValueType.Number:
                if (range[i, col].NumberFormatType == NumberFormatType.Date || range[i, col].NumberFormatType ==  NumberFormatType.DateTime)
                {
                    yield return DataTypeValue.Date;
                }
                else
                {
                    yield return DataTypeValue.Numeric;
                }
                break;
            case SpreadsheetGear.ValueType.Logical:
                yield return DataTypeValue.Bool;
                break;
            default: // ignore empty or errored cells.
                continue;
        }
    }
}

I am sure this can be further improved so please feel free to post improvements, but this does what I need for now.

Tim Jarvis
  • 18,465
  • 9
  • 55
  • 92

3 Answers3

1

In never use spreadsheetgear, but in Excel i use this UDF

Function GetType(rg As Range) As String

If IsNumeric(rg.Value) Then
    GetType = "Numeric"
ElseIf IsDate(rg.Value) Then
    GetType = "Date Time"
Else
    GetType = "String"
End If

End Function

I believe it is possible to adapt

[]'s

Bruno Leite
  • 1,403
  • 13
  • 17
  • Thanks for answering @Bruno, but no this is not what I need. In spreadsheet gear the actual value itself (range.Value) appears to always be a string, there is a ValueType property of the range, however this always only gives me the type of the top left cell in the range even if the range is "EntireColumn", which means that using this mechanism I would have to sample to get the "probable" type. I was hoping that there is some helper method or class that already does this. – Tim Jarvis Sep 29 '11 at 00:57
1

No helper method or other API exists in SpreadsheetGear to automatically return the “probable data type” for a column of values. It would not be very difficult to implement something like this to meet your own particular requirements, although there’s no way to do this without “sampling” the data. Below is a very simple method that accepts the range to check and a Boolean that indicates whether or not the range includes a header row. All it does is check the first row of data to determine the type; you might want to build something a little more robust:

private SpreadsheetGear.ValueType[] GetColumnTypes(IRange range, bool hasHeader)
{
    SpreadsheetGear.ValueType[] columnTypes = new SpreadsheetGear.ValueType[range.ColumnCount];
    for (int i = 0; i < range.ColumnCount; i++)
    {
        columnTypes[i] = range[hasHeader ? 1 : 0, i].ValueType;
    }
    return columnTypes;
}

One thing you should be aware of, however, is that SpreadsheetGear uses the same basic internal data types as Excel and will return these types when checking IRange.ValueType (these include Empty, Error, Logical, Number, Text). Note there is no DateTime. In your example this would impact the value type returned on the “Sale Date” column because dates/times are actually stored in Excel and SpreadsheetGear as a doubles representing a date/time serial number. So this type of value would return Number, not something like DateTime. The fact that they show up as a “date” in the cell is simply a function of the NumberFormat of the cell.

Tim Andersen
  • 3,014
  • 1
  • 15
  • 11
  • Thanks @Tim Anderson. In fact I ended up doing something similar, though I needed to sample a bunch of rows as well to handle sparse data. I'll edit my post with what I did in case someone else needs to do similar. – Tim Jarvis Sep 29 '11 at 22:34
0

Here is another try based on error handler and VBA type conversion:

Function probableType(vInput As Variant)
Dim vResult As Variant

'set error handler to resume (the procedure will check the error number)
On Error Resume Next

'check if it is an integer
vResult = CInt(vInput)
If Err.Number = 0 Then
    probableType = "Integer"
    Exit Function
End If
Err.Clear

'check if it is a date
vResult = CDate(vInput)
If Err.Number = 0 Then
    probableType = "Date"
    Exit Function
End If
Err.Clear

'else this is probably a string
probableType = "String"
End Function

Can be tested with this sub:

Sub uniTest()
MsgBox probableType("12/12/12")
MsgBox probableType("12")
MsgBox probableType("myTest")
End Sub

You could generalize this with all the conversion functions of Excel VBA (see this link on ozgrid)

JMax
  • 26,109
  • 12
  • 69
  • 88