6

I am writing a C# library to read in Excel files (both xls and xlsx) and I'm coming across an issue.

Exactly the same as what was expressed in this question, if my Excel file has a column that has string values, but has a numeric value in the first row, the OLEDB provider assumes that column to be numeric and returns NULL for the values in that column that are not numeric.

I am aware that, as in the answer provided, I can make a change in the registry, but since this is a library I plan to use on many machines and don't want to change every user's registry values, I was wondering if there is a better solution.

Maybe a DB provider other than ACE.OLEDB (and it seems JET is no longer supported well enough to be considered)?

Also, since this needs to work on XLS / XLSX, options such as EPPlus / XML readers won't work for the xls version.

Community
  • 1
  • 1
John Bustos
  • 19,036
  • 17
  • 89
  • 151
  • The way I get around this issue currently is by using the Excel COM interface and some C#, to open the file in Excel and re-save it as a `.csv` file. I also have a macro which is loaded and run to tidy stuff up but this way removes all the formatting and allows an easier import. – AeroX Jul 28 '15 at 16:11
  • Thanks, @AeroX, I was thinking about that, but am trying to avoid Interop if at all possible since that opens up dozens of other cans of worms :) ... I'm hoping someone else has another solution, but thanks so much for sharing your method!! – John Bustos Jul 28 '15 at 16:27
  • Use OpenXml Sdk instead. – andrei.ciprian Jul 29 '15 at 09:32
  • @andrei.ciprian, that has the issue with only working on xlsx, not xls, no? – John Bustos Jul 29 '15 at 12:42

1 Answers1

1

Your connection string should look like this

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=c:\myFolder\myExcelfile.xlsx;Extended Properties="Excel 12.0 Xml;HDR=YES;IMEX=1";

IMEX=1 in the connection string is the part that you need to treat the column as mixed datatype. This should work fine without the need to edit the registry.

HDR=Yes is simply to mark the first row as column headers and is not needed in your particular problem, however I've included it anyways.

To always use IMEX=1 is a safer way to retrieve data for mixed data columns.

Source: https://www.connectionstrings.com/excel/

Edit:

Here is the data I'm using:

data

Here is the output:

enter image description here

This is the exact code I used:

string connString = @"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\test.xlsx;Extended Properties=""Excel 12.0 Xml;HDR=YES;IMEX=1""";

using (DbClass db = new DbClass(connString))
{
    var x = db.dataReader("SELECT * FROM [Sheet1$]");
    while (x.Read())
    {
        for (int i = 0; i < x.FieldCount; i++)
            Console.Write(x[i] + "\t");
        Console.WriteLine("");
    }
}

The DbClass is a simple wrapper I made in order to make life easier. It can be found here:

http://tech.reboot.pro/showthread.php?tid=4713

Cory
  • 1,794
  • 12
  • 21
  • FYI, Sadly, Kobe, I found this online and it seems to make sense: `set "IMEX=1" in the Extended Properties section of the connection string. This enforces the ImportMixedTypes=Text registry setting. You can change the enforcement of type by changing [HKLM\Software\Microsoft\Jet\4.0\Engines\Excel\ImportMixedTypes] to numeric as well. ` - Basically IMEX=1 reinforces the registry setting, but is not good enough in-and-of itself. – John Bustos Jul 30 '15 at 16:45
  • I added the exact code I used to test. Can you provide a link to where you found that? Also, I do not have those registry settings, for either JET or ACE. So, I do not see how this can be so. – Cory Jul 30 '15 at 16:47
  • It's not on MSDN or anything, but it seems to verify what I've been seeing to be true - http://www.codeproject.com/Articles/37055/Working-with-MS-Excel-xls-xlsx-Using-MDAC-and-Oled - about 1/4 way down the page.... – John Bustos Jul 30 '15 at 17:14
  • Thanks for the update @KoBe - Are you sure you don't have the regstry keys set? I'm trying, but stil lnot getting the data returned.... You FOR SURE have a column beginning with (and most consisting of) numbers and with strings in there too and you're getting the string values back too?? – John Bustos Jul 30 '15 at 17:16
  • 1
    Updated the post again. I'm sure. I've checked the Wow6432Node. What is the path you're looking at? Are you x64 or x86? – Cory Jul 30 '15 at 17:43
  • This drives me crazy - I'm obviously doing something wrong, but have no clue where, then!!!! - In reply to your question, it's a x64 machine. Regardless, than you so much for your help and perseverance.... – John Bustos Jul 30 '15 at 22:52
  • 2
    Ha, computers are crazy. You may not be doing anything wrong, as well as there could be some wild issue that's causing either your computer to not work, or mine to function without the keys. Regardless, I hope you figure out a solution. Perhaps pulling the Excel automation dlls into your project and going down that path. – Cory Jul 31 '15 at 16:31