I have a table full of stock price data. Each row has a unique combination of Ticker symbols and dates. I load new data all the time by obtaining CSV files containing stock price data for everyday for every ticker. I know that there are duplicates in the CSV files. I only want to add that data that is not already in my data table. What is the quickest way to do this?
Should I try to add every row and catch each exception? Or, should I compare each row against my data table by reading my data table to see that line already exists? Or, is there another alternative?
Additional Info
This is what I have been doing. For each line in the CSV file I read my data table to see if it already exists.
Dim strURL As String
Dim strBuffer As String
strURL = "http://ichart.yahoo.com/table.csv?s=" & tickerValue
strBuffer = RequestWebData(strURL)
Dim sReader As New StringReader(strBuffer)
Dim List As New List(Of String)
Do While sReader.Peek >= 0
List.Add(sReader.ReadLine)
Loop
List.RemoveAt(0)
Dim lines As String() = List.ToArray
sReader.Close()
For Each line In lines
Dim checkDate = line.Split(",")(0).Trim()
Dim dr As OleDbDataReader
Dim cmd2 As New OleDb.OleDbCommand("SELECT * FROM " & tblName & " WHERE Ticker = ? AND [Date] = ?", con)
cmd2.Parameters.AddWithValue("?", tickerValue)
cmd2.Parameters.AddWithValue("?", checkDate)
dr = cmd2.ExecuteReader
If dr.Read() = 0 Then
Dim cmd3 As OleDbCommand = New OleDbCommand("INSERT INTO " & tblName & " (Ticker, [Date], [Open], High, Low, [Close], Volume, Adj_Close) VALUES (?, ?, ?, ?, ?, ?, ?, ?)", con)
cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = checkDate
cmd3.Parameters.Add("@[Open]", OleDbType.VarChar).Value = line.Split(",")(1).Trim
cmd3.Parameters.Add("@High", OleDbType.VarChar).Value = line.Split(",")(2).Trim
cmd3.Parameters.Add("@Low", OleDbType.VarChar).Value = line.Split(",")(3).Trim
cmd3.Parameters.Add("@[Close]", OleDbType.VarChar).Value = line.Split(",")(4).Trim
cmd3.Parameters.Add("@Volume", OleDbType.VarChar).Value = line.Split(",")(5).Trim
cmd3.Parameters.Add("@Adj_Close", OleDbType.VarChar).Value = line.Split(",")(6).Trim
cmd3.ExecuteNonQuery()
Else
End If
This is what I have switched to and it gives this exception: The changes you requested to the table were not successful because they would create duplicate values in the index, primary key, or relationship. Change the data in the field or fields that contain duplicate data, remove the index, or redefine the index to permit duplicate entries and try again.
I could catch this exception every time and ignore it until I hit a line that is new.
Dim strURL As String = "http://ichart.yahoo.com/table.csv?s=" & tickerValue
Debug.WriteLine(strURL)
Dim strBuffer As String = RequestWebData(strURL)
Using streamReader = New StringReader(strBuffer)
Using reader = New CsvReader(streamReader)
reader.ReadHeaderRecord()
While reader.HasMoreRecords
Dim dataRecord As DataRecord = reader.ReadDataRecord()
Dim cmd3 As OleDbCommand = New OleDbCommand("INSERT INTO " & tblName & " (Ticker, [Date], [Open], High, Low, [Close], Volume, Adj_Close) VALUES (?, ?, ?, ?, ?, ?, ?, ?)", con)
cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = dataRecord.Item("Date")
cmd3.Parameters.Add("@[Open]", OleDbType.VarChar).Value = dataRecord.Item("Open")
cmd3.Parameters.Add("@High", OleDbType.VarChar).Value = dataRecord.Item("High")
cmd3.Parameters.Add("@Low", OleDbType.VarChar).Value = dataRecord.Item("Low")
cmd3.Parameters.Add("@[Close]", OleDbType.VarChar).Value = dataRecord.Item("Close")
cmd3.Parameters.Add("@Volume", OleDbType.VarChar).Value = dataRecord.Item("Volume")
cmd3.Parameters.Add("@Adj_Close", OleDbType.VarChar).Value = dataRecord.Item("Adj Close")
cmd3.ExecuteNonQuery()
End While
End Using
End Using
I just want to use the most efficient method.
Update
Per the answers below, this is the code I have so far:
Dim strURL As String = "http://ichart.yahoo.com/table.csv?s=" & tickerValue
Dim strBuffer As String = RequestWebData(strURL)
Using streamReader = New StringReader(strBuffer)
Using reader = New CsvReader(streamReader)
' the CSV file has a header record, so we read that first
reader.ReadHeaderRecord()
While reader.HasMoreRecords
Dim dataRecord As DataRecord = reader.ReadDataRecord()
Dim cmd3 As OleDbCommand = New OleDbCommand("INSERT INTO " & tblName & "(Ticker, [Date], [Open], High, Low, [Close], Volume, Adj_Close) " & "SELECT ?, ?, ?, ?, ?, ?, ?, ? " & "FROM DUAL " & "WHERE NOT EXISTS (SELECT 1 FROM " & tblName & " WHERE Ticker = ? AND [Date] = ?)", con)
cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = dataRecord.Item("Date")
cmd3.Parameters.Add("@[Open]", OleDbType.VarChar).Value = dataRecord.Item("Open")
cmd3.Parameters.Add("@High", OleDbType.VarChar).Value = dataRecord.Item("High")
cmd3.Parameters.Add("@Low", OleDbType.VarChar).Value = dataRecord.Item("Low")
cmd3.Parameters.Add("@[Close]", OleDbType.VarChar).Value = dataRecord.Item("Close")
cmd3.Parameters.Add("@Volume", OleDbType.VarChar).Value = dataRecord.Item("Volume")
cmd3.Parameters.Add("@Adj_Close", OleDbType.VarChar).Value = dataRecord.Item("Adj Close")
cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = dataRecord.Item("Date")
cmd3.ExecuteNonQuery()
End While
End Using
End Using
It gives me this error Data type mismatch in criteria expression.