Importing from CSV file, how to skip first X rows

Question

I am writing an application to process and upload a csv file to a remote server. The application needs to be fairly flexible, allowing custom mappings of columns.

The first big hurdle I've come to is with reading the file, many of the files have the data starting at row 5+ with some junk data at the top, example below:

Account: 123
----------
Date: 15/12/2011
----------
Type: Full
----------
Column1,Column2,Column3,Column4
Data1,Data2,Data3,Data4
Data1,Data2,Data3,Data4

Ideally I'd like a way to pick up the first line of data but this might be tricky.

I'm connecting to the file using Jet with the connection string:

If (_extension = ".csv") Then

                _excelconn.ConnectionString = String.Format( _
                    "Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0}" & _
                    "; Extended Properties=""text;HDR=No;FMT=Delimited""", _directory)

End If

Then getting data via the code:

If ValidateAll() Then
            Try

                ConnectExcel() ' Opens connection to csv/excel file

                _excelcommand.CommandText = "SELECT * FROM [" & _excelsheets(_selsheet) & "]" ' Currently selected sheet, returns SELECT * FROM [1234#csv]
                _excelcommand.Connection = _excelconn

                _reader = _excelcommand.ExecuteReader()

                While _reader.Read()

                    'MsgBox(_reader(0).ToString)
                    'Dostuff

                End While

            Catch ex As Exception
                _errors.Add(ex.ToString)
                retval = -1
            End Try


        End If

This is falling over with an exception as I can't access reader(2) for the first lines.

Is there any way in the connection string or in the SELECT statement to tell Jet to skip the first X rows of the sheet? If not is there a way to only process the reader.Read() if the row number is higher than X?

I know it's probably possible with reading the entire file to an array and parsing from there or by copying the file contents to a new csv file but if there is a way to avoid that and do it through jet/ado then I'd much prefer that!

(tagged sql as there might be an sql solution in the ado query)

There are a few examples in search that may be of interest: http://stackoverflow.com/search?q=skip+first+rows+read — Fionnuala, Aug 03 '12 at 19:32

score 0 · Accepted Answer · answered Sep 03 '12 at 16:03

Found the answer after a lot of experiments and searches, moved to a table adaptor, code below:

Private _filepath As String
Private _directory As String
Private _filename As String
Private _extension As String
Private _selsheet As Integer
Private _skiprows As Integer
Private _data As New DataTable

Private _excelconn As New OleDbConnection
//Private _reader As OleDbDataReader
Private _excelcommand As New OleDbCommand
Private _excelta As New OleDbDataAdapter

_excelcommand.CommandText = "SELECT * FROM [" & _excelsheets(_selsheet) & "]"
_excelcommand.Connection = _excelconn

_excelta.SelectCommand = _excelcommand
_data = New DataTable
_excelta.Fill(_skiprows, Int32.MaxValue, _data)


For i = 0 To _data.Columns.Count - 1
    If _data.Rows(0).Item(i).ToString <> "" And IsNothing(_data.Columns(_data.Rows(0).Item(i).ToString)) Then
          _data.Columns.Item(i).ColumnName = _data.Rows(0).Item(i).ToString
    End If
Next

_data.Rows.RemoveAt(0) //removes first data row containing labels

Importing from CSV file, how to skip first X rows

1 Answers1