0

This is a real life example, please take a look at my sqlfiddle,

http://sqlfiddle.com/#!3/d8902/1

I have 3 tables, TickData where all data should go to and 2 source tables with data that have some overlapping data, [ES 09-11_Part01] and [ES 09-11_Part02].

Please note, although in the sqlfiddle I've only populated the 2 tables, ES 09-11_Part0X with a very small amount of data, I am in fact dealing with over 4 million rows each source table.

I need to Insert rows into TickData from the 2 tables and need to avoid duplicates, however 2 caveats.

Firstly, TickData.LastDateTime is a composite of [ES 09-11_Part0X].LastDate and [ES 09-11_Part0X].LastTime.

Secondly, duplicates are identified only if from [ES 09-11_Part0X], CQGTickId and LastDate with LastTime are equal to TickId and TickData.LastDateTime respectively, CQGTickId or TickId by itself is not unique, it must be combined with the date field.

How would I avoid inserting any duplicates into TickData, if the record already exists in TickData, do not insert into TickData from source table and if there is duplicate data in the source table, [ES 09-11_Part0X] it should be inserted one time only into TickData.

In my sqlfiddle, note that [ES 09-11_Part02] has 1 record which already exists in [ES 09-11_Part01]

('2011-04-05','14:12:07.4758734',12267,1,70,12267,12269,163377)

also [ES 09-11_Part02] contains another duplicate within its own table that occurs 3 times

('2011-04-06','00:47:31.2748398',12272,1,12,12269,12272,194813)

Therefore the final output of TickData after the 2 inserts from the source table should not include rows 6, 9 and 10. Only 1 occurrence of the data should exist.

This is a real life example for Stock Ticks data so it needs to be accurate and reliable please. I will also have to continually update TickData with additional source tables which may have again duplicate and overlapping values within.

Thanks in advance to any TSQL geniuses. :D

(original link:http://sqlfiddle.com/#!3/c504b/1)

DeveloperDavid
  • 157
  • 2
  • 7
  • 16

2 Answers2

0

Try with this: http://sqlfiddle.com/#!3/c504b/12

WHERE NOT EXISTS ( SELECT 1 FROM [TickData] t 
    Where t.LASTDATETIME = cast(p.[LastDate] as datetime) + cast(p.[LastTime] as datetime)
    And t.LAST = p.Last
    And t.BID = p.Bid
    And t.ASK = p.Ask
    And t.TICKID = p.CQGTickId )
thepirat000
  • 12,362
  • 4
  • 46
  • 72
  • Hi, thanks for that, duplicates that exists in the second source table, ('2011-04-06','00:47:31.2770000',12272,1,12,12269,12272,194813) still occurs 3 times in the output TickData table. It should only be inserted once into TickData – DeveloperDavid Mar 12 '14 at 02:22
  • Add DISTINCT to your SELECTs, check the updated [SqlFiddle](http://sqlfiddle.com/#!3/c504b/12) – thepirat000 Mar 12 '14 at 02:28
  • Thanks for the Distinct, however, rows are only considered duplicate only if CQGTickId and LastDate as a combination are the same or equal to TickId and LastDateTime (only date part) in TickData table. It is possible that other values, such as Last, LastSize, Bid and or Ask are different, however if CQGTickId and LastDate in combination are the same as previously mentioned, then it would be considered as erroneous or duplicated data. Any other suggestions? – DeveloperDavid Mar 12 '14 at 02:31
  • In that case you'll have to use Group By, but which of the non-unique data will you preserve on your final table? Can you provide a SqlFiddle with the case you mention? – thepirat000 Mar 12 '14 at 02:39
  • Hi, I didn't understand as to which non-unique data will I preserve on my final table, so I edited the sqlFiddle and in the schema I've made a few comments and changed some of the values, I hope this clarifies things :D http://sqlfiddle.com/#!3/cc722/1 – DeveloperDavid Mar 12 '14 at 02:57
  • For the last two records, what LAST, LASTSIZE and TOTALVOLUME you want to get inserted into TickData table? – thepirat000 Mar 12 '14 at 03:27
  • For the last 2 records, they should not be inserted at all. Like any records that are deemed duplicates, they should be ignored. I just added a little more comment in the sqlfiddle, http://sqlfiddle.com/#!3/d8902/1 – DeveloperDavid Mar 12 '14 at 03:56
0
WITH CTE_TABLE 
AS
(SELECT * FROM TABLE1
UNION 
SELECT * FROM TABLE 2)
INSERT INTO YOUR_TABLE
SELECT * FROM CTE_TABLE
Simon
  • 43
  • 10
  • I will have further tables with source data to add later on, plus, I am not sure this will avoid duplicates in my output table, TickData. I don't see a check to see if that data exists in TickData. Plus, nothing was given for caveat 1, LastDateTime – DeveloperDavid Mar 12 '14 at 02:25