0

Here's a simple query we do for ad hoc requests from our Marketing department on the leads we received in the last 90 days.

SELECT ID
    ,FIRST_NAME
    ,LAST_NAME
    ,ADDRESS_1
    ,ADDRESS_2
    ,CITY
    ,STATE
    ,ZIP
    ,HOME_PHONE
    ,MOBILE_PHONE
    ,EMAIL_ADDRESS
    ,ROW_ADDED_DTM
FROM WEB_LEADS
WHERE ROW_ADDED_DTM BETWEEN @START AND @END

They are asking for more derived columns to be added that show the number of previous occurences of ADDRESS_1 where the EMAIL_ADDRESS matches. But they want is for different date ranges.

So the derived columns would look like this:

,COUNT_ADDRESS_1_LAST_1_DAYS,
,COUNT_ADDRESS_1_LAST_7_DAYS
,COUNT_ADDRESS_1_LAST_14_DAYS
etc.

I've manually filled these derived columns using update statements when there was just a few. The above query is really just a sample of a much larger query with many more columns. The actual request has blossomed into 6 date ranges for 13 columns. I'm asking if there's a better way then using 78 additional update statements.

Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
  • Having a hard time following the requirements. They want - for every unique e-mail address in the system - a count of distinct address_1 values they've had in the past day, 7 days, 14 days, etc. And also a count of distinct address_2 values they've had in those time frames. And city values, and state values, and zip values, etc.? And you don't want to have to repeat the column names or date ranges (I don't think you meant "update statements")? – Aaron Bertrand Jan 27 '13 at 02:37

1 Answers1

2

I think you will have a hard time writing a query that includes all of these 78 metrics per e-mail address without actually creating a query that hard-codes the different choices. However you can generate such a pivot query with dynamic SQL, which will save you some keystrokes and will adjust dynamically as you add more columns to the table.

The result you want to end up with will look something like this (but of course you won't want to type it):

;WITH y AS
(
  SELECT 
    EMAIL_ADDRESS,

/* aggregation portion */

    [ADDRESS_1] = COUNT(DISTINCT [ADDRESS_1]),
    [ADDRESS_2] = COUNT(DISTINCT [ADDRESS_2]),
    ... other columns

/* end agg portion */

    FROM dbo.WEB_LEADS AS wl 
    WHERE ROW_ADDED_DTM >= /* one of 6 past dates */
    GROUP BY wl.EMAIL_ADDRESS
)
SELECT EMAIL_ADDRESS,

/* pivot portion */

  COUNT_ADDRESS_1_LAST_1_DAYS = *count address 1 from 1 day ago*,
  COUNT_ADDRESS_1_LAST_7_DAYS = *count address 1 from 7 days ago*,
  ... other date ranges ...
  COUNT_ADDRESS_2_LAST_1_DAYS = *count address 2 from 1 day ago*,
  COUNT_ADDRESS_2_LAST_7_DAYS = *count address 2 from 7 days ago*,
  ... other date ranges ...
  ... repeat for 11 more columns ...

/* end pivot portion */
FROM y 
GROUP BY EMAIL_ADDRESS
ORDER BY EMAIL_ADDRESS;

This is a little involved, and it should all be run as one script, but I'm going to break it up into chunks to intersperse comments on how the above portions are populated without typing them. (And before long @Bluefeet will probably come along with a much better PIVOT alternative.) I'll enclose my interspersed comments in /* */ so that you can still copy the bulk of this answer into Management Studio and run it with the comments intact.

Code/comments to copy follows:


/* First, let's build a table of dates that can be used both to derive labels for pivoting and to assist with aggregation. I've added the three ranges you've mentioned and guessed at a fourth, but hopefully it is clear how to add more: */

DECLARE @d DATE = SYSDATETIME();

CREATE TABLE #L(label NVARCHAR(15), d DATE);

INSERT #L(label, d) VALUES
(N'LAST_1_DAYS',  DATEADD(DAY,   -1,  @d)),
(N'LAST_7_DAYS',  DATEADD(DAY,   -8,  @d)),
(N'LAST_14_DAYS', DATEADD(DAY,   -15, @d)),
(N'LAST_MONTH',   DATEADD(MONTH, -1,  @d));

/* Next, let's build the portions of the query that are repeated per column name. First, the aggregation portion is just in the format col = COUNT(DISTINCT col). We're going to go to the catalog views to dynamically derive the list of column names (except ID, EMAIL_ADDRESS and ROW_ADDED_DTM) and stuff them into a #temp table for re-use. */

SELECT name INTO #N FROM sys.columns
WHERE [object_id] = OBJECT_ID(N'dbo.WEB_LEADS')
AND name NOT IN (N'ID', N'EMAIL_ADDRESS', N'ROW_ADDED_DTM');

DECLARE @agg NVARCHAR(MAX) = N'', @piv NVARCHAR(MAX) = N'';

SELECT @agg += ',
  ' + QUOTENAME(name) + ' = COUNT(DISTINCT ' 
  + QUOTENAME(name) + ')' FROM #N;

PRINT @agg;

/* Next we'll build the "pivot" portion (even though I am angling for the poor man's pivot - a bunch of CASE expressions). For each column name we need a conditional against each range, so we can accomplish this by cross joining the list of column names against our labels table. (And we'll use this exact technique again in the query later to make the /* one of past 6 dates */ portion work. */

SELECT @piv += ',
  COUNT_' + n.name + '_' + l.label
  + ' = MAX(CASE WHEN label = N''' + l.label 
  + ''' THEN ' + QUOTENAME(n.name) + ' END)'
FROM #N as n CROSS JOIN #L AS l;

PRINT @piv;

/* Now, with those two portions populated as we'd like them, we can build a dynamic SQL statement that fills out the rest: */

DECLARE @sql NVARCHAR(MAX) = N';WITH y AS
(
    SELECT 
      EMAIL_ADDRESS, l.label' + @agg + '
      FROM dbo.WEB_LEADS AS wl 
      CROSS JOIN #L AS l
      WHERE wl.ROW_ADDED_DTM >= l.d
      GROUP BY wl.EMAIL_ADDRESS, l.label
)
SELECT EMAIL_ADDRESS' + @piv + '
FROM y 
GROUP BY EMAIL_ADDRESS
ORDER BY EMAIL_ADDRESS;';

PRINT @sql;

EXEC sp_executesql @sql;
GO
DROP TABLE #N, #L;

/* Now again, this is a pretty complex piece of code, and perhaps it can be made easier with PIVOT. But I think even @Bluefeet will write a version of PIVOT that uses dynamic SQL because there is just way too much to hard-code here IMHO. */

Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490