I have a tricky SQL question. This is based on SQL server 2008 R2.
From a Log table, I have to combine consecutive records which have the same messages (MSG
), count how many messages are combined (COUNT
), and then delete the duplicate messages. This also needs to be done within a date range, so that any records outside of that range is left alone.
To make this more understandable, here is a small example of the data:
ID DATE MSG COUNT
1 2013-08-17 mail NULL
2 2013-08-17 mail NULL
3 2013-08-17 www NULL
4 2013-08-18 www NULL
5 2013-08-18 www NULL
6 2013-08-18 www NULL
7 2013-08-18 mail NULL
8 2013-08-18 www NULL
9 2013-08-19 mail NULL
10 2013-08-19 mail NULL
11 2013-08-20 mail NULL
12 2013-08-20 mail NULL
13 2013-08-21 www NULL
14 2013-08-22 mail NULL
15 2013-08-22 mail NULL
16 2013-08-23 mail NULL
17 2013-08-23 mail NULL
18 2013-08-23 mail NULL
The result should look like the followng:
ID DATE MSG COUNT
1 2013-08-17 mail NULL
2 2013-08-17 mail NULL
3 2013-08-17 www NULL
6 2013-08-18 www 3
7 2013-08-18 mail 1
8 2013-08-18 www 1
12 2013-08-20 mail 4
13 2013-08-21 www 1
15 2013-08-22 mail 2
16 2013-08-23 mail NULL
17 2013-08-23 mail NULL
18 2013-08-23 mail NULL
So, basically, the query should
- handle data only within a given date range (in this example from
2013-08-18
to2013-08-22
) - combine consecutive rows based on the text of the
MSG
field - count the combined data and set the value in the
COUNT
field - delete the duplicate records (in this example e.g ID 6 stays, but ID 5 and ID 4 should be deleted)
As I am not an expert in SQL, I would really appreciate any help, suggestions or SQL queries.