1

Need some help, here is my SQL below: I am getting duplicates back and it will not return my LastRespondedDate Field. Any help would be greatly appreciated.

SELECT t.[column1],
       pr.[column1],
       pr.[RespondedDttm],
       t.[column2],
       t.[column3],
       t.[column4]
FROM Table1 t LEFT JOIN
(
    SELECT [t.column1], [pr.column2], [RespondedDttm], MAX([RespondedDttm]) AS LastRespondedDate
    FROM Table2 pr
    GROUP BY [column1], RespondedDttm, [pr.column3]) pr
    ON (t.[column1] = pr.[column1])
WHERE t.[column8] IN (value) AND
      (pr.[RespondedDttm] >= '2015-09-01') AND
      (pr.[Response] IS NOT Null)
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • The short explanation for this is that, you can't return all your data with a max date in one pass. You need one pass to identify the record with that date, and a second pass to retrieve the rest of the record. Doing it the other way returns ALL unique records. Which defeats the purpose of only the Max. – durbnpoisn Aug 15 '16 at 17:04
  • Why is t.column1 in the inner select? T should be unknown – xQbert Aug 15 '16 at 17:07
  • what rdbms/database? sql-server oracle mysql? @durbnpoisn with window functions you can certainly get the max of a dataset without group by or multiple passes so it depends on which rdbms is used. With that said if he is trying to elminiate duplicates then the key will be to pick a specific combination of conditions that will only return 1 distinct record/information from table2 pr – Matt Aug 15 '16 at 17:17
  • Kyle do you want the last non null response > 2015-09-01 or the last response if > 2015-09-01 and if response is not null and if column8 has value. The difference is subtle but filtering response data prior to join will yield the latest row that meets your criteria even if that row is not the last row if one exists and filtering after will only return the last row if it exists with those conditions – Matt Aug 15 '16 at 17:22
  • t.[column1] is a job#, and we are sometimes getting multiple response. So my goal is return all job# and the associated last response – Kyle watford Aug 15 '16 at 17:24
  • SQL Server.. And looking for "the last response if > 2015-09-01 and if response is not null and if column8 has value" – Kyle watford Aug 15 '16 at 17:26

1 Answers1

0
SELECT
    t.[column1],
    pr.[RespondedDttm] as LastRespondedDate,
    t.[column2],
    t.[column3],
    t.[column4]
FROM
    Table1 t
    LEFT JOIN
    (
       SELECT
          t2.[Column1]
          ,t2.[column2]
          ,[RespondedDttm]
          ,RowNum = ROW_NUMBER() OVER (PARTITION BY ColWithDups ORDER BY [RespondedDttm])
          ,WithTiesRowNum = RANK() OVER (PARTITION BY ORDER BY [RespondedDttm] DESC) 
       FROM
          Table2 t2
       WHERE
          t2.[RespondedDttm] >= '2015-09-01'
          AND t2.[Response] IS NOT Null
    ) pr
    ON (t.[column1] = pr.[column1])
    AND pr.RowNum = 1
    --AND pr.WithTiesRowNum = 1 --use this line if you want ties
WHERE
    t.[column8] IN (value)

You can use window functions and ROW_NUMBER if you want only 1 row or RANK() if you want all ties.

In case you wanted the other direction of the nuance which I think you comment suggests:

SELECT
    t.[column1],
    pr.[RespondedDttm] as LastRespondedDate,
    t.[column2],
    t.[column3],
    t.[column4]
FROM
    Table1 t
    LEFT JOIN
    (
       SELECT
          t2.[Column1]
          ,t2.[column2]
          ,[RespondedDttm]
          ,RowNum = ROW_NUMBER() OVER (PARTITION BY ColWithDups ORDER BY [RespondedDttm])
          ,WithTiesRowNum = RANK() OVER (PARTITION BY ORDER BY [RespondedDttm] DESC) 
       FROM
          Table2 t2
    ) pr
    ON (t.[column1] = pr.[column1])
    AND pr.[RespondedDttm] >= '2015-09-01'
    AND pr.[Response] IS NOT Null
    AND pr.RowNum = 1
    --AND pr.WithTiesRowNum = 1 --use this line if you want ties
WHERE
    t.[column8] IN (value)
Matt
  • 13,833
  • 2
  • 16
  • 28