1

I am combining multiple large excel files with different columns and number of columns.

Before starting to combine, I want to collect all header rows in order to make a data table which having all columns in advance.

I know that there is a method datatable.merge in c#, which allow to add missing column while combining.

Because there are too many big excel files, and the maximum rows per sheet in excel is about 1 millions row. So when reaching limit, I must save part of combining to excel, clear the content and keep combine after that. This will lead to the result that the saving part in the early process will don't have the same schema as the final one.

This is the reason why I must collect all header in advance.

As far as I am concerned, library in c# like Epplus or ExcelDataReader load entire content of excel. This lasts very long. I don't need to load all content at once.

Somebody here know how to load excel header row only ?

Thank you so much.

user3819222
  • 127
  • 8
  • A worksheet function like `='C:\Users\[UserName]\Desktop\[STO 200122.xlsm]Sheet1'!A1` will return the value of the specified range from a closed workbook. I don't believe that the workbook is loaded but I suggest you try it with the large files you have. – Variatus Jan 24 '20 at 04:05
  • I doubt it will do things any differently, but have you tried OLE? It uses Jet in the background, and my guess is it also loads everything into memory. – Hambone Jan 25 '20 at 02:37
  • An .xlsx file is a .zip archive of XML content. There's no way to read part of that without opening the archive and reading through the .xml files, even if you only want to retrieve 1 row of data from one sheet. EPPlus will be significantly faster than calls to the Excel application, though. – CaM Jan 28 '20 at 15:35

0 Answers0