6

I am checking website entrys that are recorded in a database

columns: browser, click_type_id, referrer, and datetime

if multiple rows have the same browser, click_type_id, and referrer and are timestamped (occur within 1 minute of one another) they are considered a duplicate.

I need a sql statement that can query for these duplicates based on the above criteria.

Any help is appreciated.

Keng
  • 52,011
  • 32
  • 81
  • 111
JD.
  • 61
  • 1
  • 2
  • delete? select all? test for and prevent insert? – gbn Dec 15 '09 at 20:51
  • What kind of dialect? Ever heard of group by? – Paco Dec 15 '09 at 20:56
  • 5
    An interesting question is what do you consider within 1 minute of another? If there's a row at 12:30:05, one at 12:30:45 and one at 12:31:10, the first two are within a minute of each other, the last two are as well, but the first and last are not. You would have to figure out how to address that as part of your solution. – BBlake Dec 15 '09 at 20:57
  • Do you want to return the duplicates or return distinct values (without duplicates)? – Gabriel McAdams Dec 15 '09 at 21:16

2 Answers2

15
SELECT
     T1.browser,
     T1.click_type,
     T1.referrer,
     T1.datetime,
     T2.datetime
FROM
     My_Table T1
INNER JOIN My_Table T2 ON
     T2.browser = T1.browser AND
     T2.click_type = T1.click_type AND
     T2.referrrer = T1.referrer AND
     T2.datetime > T1.datetime AND
     T2.datetime <= DATEADD(mi, 1, T1.datetime)
Tom H
  • 46,766
  • 14
  • 87
  • 128
  • 1
    You could also just take off the t2 browser, click_type, and referrer references in the select since they will always equal t1's. – ryanulit Dec 15 '09 at 21:12
  • Thanks for the answer. However, any idea how can I count number of occurrence within 1 minute. Using `count` and `groupby` directly will not yield accurate result. It gave "number of success self join", instead of number of left table before join. – Cheok Yan Cheng Feb 06 '17 at 10:31
  • You'd have to clarify specifically what you want. A string of 20 records each within 5 seconds of each other could be 12 within one minute or 8 within another or they could be 4-12-4 or something else. Probably worth starting your own question rather than trying to sort through it in the comments here. – Tom H Feb 07 '17 at 13:50
1

To prevent inserts

INSERT MyTable (browser, click_type_id, referrer, [datetime])
SELECT
    @browser, @click_type_id, @referrer, @datetime
WHERE
    NOT EXISTS (SELECT *
        FROM
           MyTable M2
        WHERE
           browser = @browser AND click_type_id = @click_type_id AND referrer = @referrer
           AND
           [datetime] < DATEADD(minute, -1, @datetime))

To find in existing data (relies on smalldatetime accuracy and may help to avoid issues as per comment to question)

SELECT
   browser, click_type_id, referrer, COUNT(*)
FROM
   MyTable
GROUP BY
    browser, click_type_id, referrer, (CAST [datetime] AS smalldatetime)
HAVING
    COUNT(*) > 1
gbn
  • 422,506
  • 82
  • 585
  • 676