2

learning sql here and I ran into a challenge.

I have the following table:

tbl <- data.frame(
   id_name = c("a", "a", "b", "c", "d", "f", "b", "c", "d", "f"),
   value = c(1, -1, 1, 1, 1, 1, -1, -1, -1, -1),
   score = c(1, 0, 1, 2, 3, 4, 3, 2, 1, 0),
   date = as.Date(c("2001-1-1", "2002-1-1", "2003-1-1", "2005-1-1", 
                    "2005-1-1", "2007-1-1", "2008-1-1", "2010-1-1", 
                    "2011-1-1", "2012-1-1"), "%Y-%m-%d")
                   )


+---------+-------+-------+-----------+
| id_name | value | score |   date    |
+---------+-------+-------+-----------+
| a       |     1 |     1 |  2001-1-1 |
|  a      |    -1 |     0 |  2002-1-1 |
|  b      |     1 |     1 |  2003-1-1 |
|  c      |     1 |     2 |  2005-1-1 |
|  d      |     1 |     3 |  2005-1-1 |
|  f      |     1 |     4 |  2007-1-1 |
|  b      |    -1 |     3 |  2008-1-1 |
|  c      |    -1 |     2 |  2010-1-1 |
|  d      |    -1 |     1 |  2011-1-1 |
|  f      |    -1 |     0 |  2012-1-1 |
+---------+-------+-------+-----------+

My goal is this:

For each id_name, I'd like to get the first date (in case of tie breakers) of maximum score from the tbl between the dates where the current row = id_name (inclusive)

For example, id_name 'a' should return '2001-1-1' since its score is 1 id_name 'b' should return '2007-1-1' since its score is 4:

+---------+----------+
| id_name |   date   |
+---------+----------+
| a       | 2001-1-1 |
| b       | 2007-1-1 |
+---------+----------+

This is what I have thus far,

   sqldf("
  SELECT 
    id_name,
    date,
    score
  FROM
    tbl As d
  WHERE
    score = (
                        SELECT MAX(score)
                        FROM tbl As b
                        WHERE 
                          date >= (
                                        SELECT MIN(date)
                                        FROM tbl
                                        WHERE id_name = b.id_name
                          ) AND
                          date <= (
                                        SELECT MAX(date)
                                        FROM tbl
                                        WHERE id_name = b.id_name
                          )
    )
  ")

Problem is that it is returning the rows with the global max value irrespective of the current row value

Thanks!

JNevill
  • 46,980
  • 4
  • 38
  • 63

1 Answers1

0

I think a correlated subquery in the WHERE clause will fit the bill here:

SELECT id_name, date
FROM tbl as t1
WHERE score = (SELECT max(score) FROM tbl WHERE id_name = t1.id_name)
JNevill
  • 46,980
  • 4
  • 38
  • 63
  • Thanks for the reply JNevill. It gets me closer, but not quite there. For example, for id_name = 'b', it should return '2007-1-1' because this is when the score is greatest and is between the first and last date ('2003-01-01' and '2008-01-01') of id_name = 'b' – sir_chocolate_soup Nov 16 '17 at 21:26
  • I updated question and my attempt above. Let me know if it still isn't clear. thanks. – sir_chocolate_soup Nov 17 '17 at 01:49
  • So you are looking for the date of the max score of any records that occur between the two dates for the given id_name? Will your id_name ever only have 1 record or more than 2 records? – JNevill Nov 17 '17 at 14:33
  • Also, just to clarify is this a sqlite database? I see it in the tags, but just want to confirm since folks VERY VERY often tag the wrong RDBMS if their questions and the RDBMS in use will play a big role here since LAG() and other window functions should make this very doable. – JNevill Nov 17 '17 at 14:34
  • Yes to your first question and I'm doing this in R using the sqldf package so it is sqlite! – sir_chocolate_soup Nov 17 '17 at 15:25