3

I have a table which looks something like this:

+------------+------------+--------------+
| Date       | Name       | Certificates |
+------------+------------+--------------+
| 2021-02-01 | Jason      | 3            |
| 2021-02-01 | Nisha      | 4            |
| 2021-02-01 | Zaid       | 5            |
| 2021-03-25 | Aniket     | 4            |
| 2021-03-25 | Anish      | 2            |
| 2021-03-25 | Nadia      | 0            |
| 2021-05-06 | Aadil      | 7            |
| 2021-05-06 | Ashish     | 1            |
| 2021-05-06 | Rahil      | 9            |
+------------+------------+--------------+

This result is obtained by performing the following SQL query:

SELECT 
    Date, Name, COUNT(Certificates) as Certificates
FROM Students.data
GROUP BY Date, Name
ORDER BY Date, Name;

After receiving this result, ideally, I would like only the first entry from each date now (that would basically be the first name for each date), which should be something like this:

+------------+------------+--------------+
| Date       | Name       | Certificates |
+------------+------------+--------------+
| 2021-02-01 | Jason      | 3            |
| 2021-03-25 | Aniket     | 4            |
| 2021-05-06 | Aadil      | 7            |
+------------+------------+--------------+

Is there a way I can modify the above group by query to obtain the result, or do I need to pass the result of this query to some other query, if so, what would that query be. Thanks.

Also, the database I am using is Clickhouse.

NOTE: Please let me know if there is any issue with the question, can clarify that.

Shaikh Abuzar
  • 101
  • 1
  • 9

4 Answers4

2

You consider your result an intermediate result of which you want to pick one row per date. You can use ROW_NUMBER for this to number the rows per date by name and only keep a date's first row (those rows numbered 1).

SELECT date, name, certificates
FROM
(
  SELECT 
    date, name, COUNT(Certificates) AS certificates,
    ROW_NUMBER() OVER (PARTITION BY date ORDER BY name) AS rn
  FROM students.data
  GROUP BY date, name
) numbered
WHERE rn = 1
ORDER BY date;

Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=93c3682bda72cb4fe53fbbe8053a8acb (using MySQL 8 here, because dbfiddle.uk doesn't feature clickhouse, but the query is standard SQL compliant, so we can use about every modern RDBMS for the demonstration).

Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
  • The question was worded terribly...leaving me to think that maybe the interpretation of the OP's approach was also overthought. This looks correct +1. – Tim Biegeleisen Aug 27 '21 at 06:37
  • doing count() is not necessary if you are filtering by rank = 1 – trillion Aug 27 '21 at 06:45
  • @trillion: You are wrong. Look at the request again. The table shown there is the result of aggregating the data table by date and name and counting their rows. Without `COUNT` we don't get the number of certificates. – Thorsten Kettner Aug 27 '21 at 06:49
  • @ThorstenKettner your code returns 1 only for ceritificate column, is that expected ? because the output that he showed above has 3,4,7 in the certificate column – trillion Aug 27 '21 at 07:04
  • @ThorstenKettner here is the fiddle for your query: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=186354fcb853631b7bd1991987972ee7, I believe what he needs is this: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=a5a8bf3f6934f18b19d331d3ba43570a – trillion Aug 27 '21 at 07:07
  • @ThorstenKettner any comments ? – trillion Aug 27 '21 at 07:16
  • @trillion: This is because you are counting counts. You are treating the OP's query result as the original table and then invoke their query again on that data. This is wrong. The original table has multiple rows per student and date (of which some or all have a certificate). The query then counts the certificates per date and name. Only then do we get the counts 3, 4, 5, 4, etc., and of these we want to keep one row per date. – Thorsten Kettner Aug 27 '21 at 07:35
  • i think it's not clear what the user wants then since your query only returns 1 and the expected output by the person is 3,47 – trillion Aug 27 '21 at 07:39
  • @trillion: No. *Please* read the request. The OP shows a table. You know what a table is? A table can be a stored table or a result of a query. The OP tells us the table they are showing is a query result ("This result is obtained by performing the following SQL query"), but you use this query result and apply about the same query again. If you want to make a fiddle, then create some table data for which OP's query results in the table they are showing. – Thorsten Kettner Aug 27 '21 at 07:44
  • Well, I added a demo fiddle to my answer so as to dispel any doubts :-) – Thorsten Kettner Aug 27 '21 at 07:58
0

While looking at your output, I assumed that you wanted the sole entry for the day to be the one with alphabetically ASC on Name column.

In such case, you can use ROW_NUMBER() function if this SQL server

SELECT Date,Name, Certificates
FROM
(
SELECT 
    Date, Name, 
    Certificates=COUNT(Certificates) OVER (PARTITION BY Date,Name) 
    RowNumber = ROW_NUMBER() OVER (PARTITION BY Date
     ORDER BY Name ASC) 
FROM Students.data
) T 
WHERE RowNumber =1 
ORDER BY Date ASC
;
DhruvJoshi
  • 17,041
  • 6
  • 41
  • 60
0
  • use CTE instead of Subquery
  • rank the data => each row will have same data with an increasing rank --> ROW_NUMBER
  • filter the rank_ by by 1 to get one entry per date
  • the order by is on name in alphabetical order assuming that what you need

If you don't have the count already then use code 1 fiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=a5a8bf3f6934f18b19d331d3ba43570a

WITH ranked_data AS ( SELECT date_, name, count(certificates_) over(partition by date_,name) as certificates, row_number() OVER(PARTITION BY date_ order by name) as rank_ FROM students ) SELECT date_, name, certificates FROM ranked_data WHERE rank_ = 1

if you have the count then use code 2


WITH ranked_data AS (
SELECT date_, name, certificates_,
row_number() OVER(PARTITION BY date_ order by name) as rank_
FROM students
)
SELECT 
  date_, name, certificates_ FROM ranked_data WHERE rank_ = 1

trillion
  • 1,207
  • 1
  • 5
  • 15
0
  • straightforward way
SELECT Date, untuple(groupArray(tuple(Name, Certificates))[1])
FROM (
    SELECT *
    FROM  (
        /* Emulate the test dataset. */
        SELECT toDate(data.1) AS Date, data.2 AS Name, data.3 AS Certificates
        FROM (
            SELECT arrayJoin([
                ('2021-02-01', 'Jason ', 3),
                ('2021-02-01', 'Nisha ', 4),
                ('2021-02-01', 'Zaid  ', 5),
                ('2021-03-25', 'Aniket', 4),
                ('2021-03-25', 'Anish ', 2),
                ('2021-03-25', 'Nadia ', 0),
                ('2021-05-06', 'Aadil ', 7),
                ('2021-05-06', 'Ashish', 1),
                ('2021-05-06', 'Rahil ', 9)]) AS data
            )
        )
    ORDER BY Date, Name
    )
GROUP BY Date

/*
┌───────Date─┬─Name───┬─Certificates─┐
│ 2021-02-01 │ Jason  │            3 │
│ 2021-03-25 │ Aniket │            4 │
│ 2021-05-06 │ Aadil  │            7 │
└────────────┴────────┴──────────────┘
*/
  • way based on window-function

Starting from version 21.4 added the full support of window-functions. At this moment it was marked as an experimental feature.

SELECT DISTINCT
    Date,
    FIRST_VALUE(Name) OVER w AS FirstName,
    FIRST_VALUE(Certificates) OVER w AS FirstCertificates
FROM 
(
    /* Emulate the test dataset. */
    SELECT toDate(data.1) AS Date, data.2 AS Name, data.3 AS Certificates
    FROM (
        SELECT arrayJoin([
            ('2021-02-01', 'Jason ', 3),
            ('2021-02-01', 'Nisha ', 4),
            ('2021-02-01', 'Zaid  ', 5),
            ('2021-03-25', 'Aniket', 4),
            ('2021-03-25', 'Anish ', 2),
            ('2021-03-25', 'Nadia ', 0),
            ('2021-05-06', 'Aadil ', 7),
            ('2021-05-06', 'Ashish', 1),
            ('2021-05-06', 'Rahil ', 9)]) AS data
        )
)
WINDOW w AS (PARTITION BY Date ORDER BY Name ASC)
SETTINGS allow_experimental_window_functions = 1

/*
┌───────Date─┬─FirstName─┬─FirstCertificates─┐
│ 2021-02-01 │ Jason     │                 3 │
│ 2021-03-25 │ Aniket    │                 4 │
│ 2021-05-06 │ Aadil     │                 7 │
└────────────┴───────────┴───────────────────┘
*/

See https://altinity.com/blog/clickhouse-window-functions-current-state-of-the-art.

vladimir
  • 13,428
  • 2
  • 44
  • 70