0

I have an Excel file which has a million rows and 25 columns.

When I transfer the Excel data to a datatable only 8000 to 9000 rows are being transferred to the datatable. But the file has million rows.

I tried the below code:


<connectionStrings>  
    <!--Connect excel for bulk upload-->
    <add name="Excel03ConString" connectionString="Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties='Excel 8.0;HDR=YES'"/>
    <add name="Excel07+ConString" connectionString="Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties='Excel 8.0;HDR=YES'"/>
  </connectionStrings>

connString = String.Format(connString, excelPath)
        Using excel_con As New OleDbConnection(connString)
            excel_con.Open()
            Dim sheet1 As String = excel_con.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, Nothing).Rows(0)("TABLE_NAME").ToString()
            Dim dtExcelData As New DataTable()
            dtExcelData.Columns.AddRange(New DataColumn(3) {New DataColumn("Document Type", GetType(String)), _
                                                            New DataColumn("Posting period", GetType(Decimal)), _
                                                            New DataColumn("Profit Center", GetType(String))                      })
            Using oda As New OleDbDataAdapter("SELECT * FROM [" + sheet1 + "]", excel_con)
                oda.Fill(dtExcelData)
            End Using
            excel_con.Close()
        End Using

How do I solve this?

Mary
  • 14,926
  • 3
  • 18
  • 27
  • I notice that you seem to only want three columns from the source data - does it work any better if you SELECT them by name instead of all the columns with `*` ? – Andrew Morton May 30 '19 at 12:38
  • @AndrewMorton No it doesn't improve. But if i covert the .xlsx to .xls format, then the whole 65536 xls limit are copied to the `datatable`,which was only 8456 for xlsx. Also if I run the build directly from VisualStudio the entire million records are copied to `datatable`. But the publish only copies 8456. Do i have to make any changes in the **IIS**? – user2248733 May 30 '19 at 13:02
  • How about changing the Extended Properties part in the connection string to `Extended Properties=""Excel 12.0 Xml;HDR=YES""`? (From [connectionstrings.com](https://www.connectionstrings.com/excel/).) – Andrew Morton May 30 '19 at 13:09
  • And if it still doesn't work, you could try the third-party alternatives to ACE mentioned in [How to import large excel file to datatable?](https://stackoverflow.com/q/34532452/1115360) – Andrew Morton May 30 '19 at 13:16
  • @AndrewMorton `Properties=""Excel 12.0 Xml;HDR=YES""` didn't make an change in the result – user2248733 May 30 '19 at 13:18
  • First of all, the limit for Excel rows is 1M, not 65536. The format supported by Excel since 2006 is `xlsx`, not `xls`. There's nothing wrong with it. You'll have to inspect the file to find out what's wrong though. Is an error thrown when loading? A cell that can't be translated as a decimal, forcing the driver to stop? Is there something else that would make the driver think the "table" it's reading from finished, like an empty row or merged cells near row `8456 ` ? Does the sheet contain multiple named ranges or tables perhaps? – Panagiotis Kanavos May 30 '19 at 13:56
  • @user2248733 try loading the file into SQL Server using SSIS or the `Import Data Wizard` in SSMS (it uses SSIS behind the scenes). The wizard uses the OLEDB provider to load the data as well. Does that work or do you get an error? – Panagiotis Kanavos May 30 '19 at 14:00
  • @user2248733 another possibility is that you're loading from the wrong sheet. The first sheet in the workbook could be a hidden sheet with lookup data for example. You could try loading all sheets in the workbook in a loop – Panagiotis Kanavos May 30 '19 at 14:03

1 Answers1

-2

Don't use the Excel.Application object, read the data from the excel file directly through xml operations. I use the free and MIT licensed .NET library SpreadsheetLight (spreadsheetlight.com) for such purposes. It is incredibly fast compared to the Excel object.

Christoph
  • 3,322
  • 2
  • 19
  • 28
  • I would be quite surprised if Microsoft.Jet.OLEDB.4.0 and Microsoft.ACE.OLEDB.12.0 would implement a separat logic to read excel files! I rather believe they ask the Excel-Object to return the data to them. In its own speed... – Christoph May 30 '19 at 13:08
  • @christoph thankyou. I checked the **spreadsheetlight** that you specified. It looks like a exporting libraray, but i want to import excel to sql So i use the process as below 1. Upload excel to server 2. Copy excel sheet data to datatable using oledb 3. copy datatable to sql table with sqlbulkcopy – user2248733 May 30 '19 at 13:14
  • And who produces the excel file? Can it also produce something else? I don't think the performance bottleneck lies in bulk-adding data to the SQLServer, especially if that are only INSERT statements. I fear it's reading the excel file that is slow. --- It's true I only used spreadsheetlight to export yet, but would have expected it to have full read-interfaces as well as it can also be used to modify data... – Christoph May 30 '19 at 13:27
  • 1
    @Christoph no, the driver is completely separate, which is why it's used in SSIS and web applications since the 2000s (OK, 1990s as well). That's why it's a separate download too. It's also *fast* since it's a native implementation – Panagiotis Kanavos May 30 '19 at 13:49
  • @user2248733. The question then is where the speed is lost? DataTable is also pretty speed optimized. Is it somehow databound to a user interface? – Christoph May 31 '19 at 09:36