1

I have a table with duplicate entries and the objective is to get the distinct entries based on the latest time stamp.

In my case 'serial_no' will have duplicate entries but I select unique entries based on the latest time stamp.

Below query is giving me the unique results with the latest time stamp. But my concern is I need to get the total of unique entries.

For example assume my table has 40 entries overall. With the below query I am able to get 20 unique rows based on the serial number. But the 'total' is returned as 40 instead of 20. Any help on this pls?

  SELECT 
  * 
  FROM 
  (
    SELECT 
      DISTINCT ON (serial_no) id, 
      serial_no, 
      name, 
      timestamp,
      COUNT(*) OVER() as total 
    FROM 
      product_info 
      INNER JOIN my.account ON id = accountid 
    WHERE 
      lower(name) = 'hello' 
    ORDER BY 
      serial_no, 
      timestamp DESC OFFSET 0 
    LIMIT 
      10
  ) AS my_info 
 ORDER BY 
   serial_no asc

enter image description here

product_info table intially has this data  

serial_no           name         timestamp                              

11212               pulp12      2018-06-01 20:00:01             
11213               mango       2018-06-01 17:00:01             
11214               grapes      2018-06-02 04:00:01             
11215               orange      2018-06-02 07:05:30             
11212               pulp12      2018-06-03 14:00:01             
11213               mango       2018-06-03 13:00:00             



After the distict query I got all unique results based on the latest 
timestamp:

serial_no       name        timestamp                   total

11212           pulp12     2018-06-03 14:00:01            6
11213           mango      2018-06-03 13:00:00            6
11214           grapes     2018-06-02 04:00:01            6
11215           orange     2018-06-02 07:05:30            6


But total is appearing as 6 . I wanted the total to be 4 since it has 
only 4 unique entries.

I am not sure how to modify my existing query to get this desired 
result.
User850309
  • 61
  • 2
  • 8

3 Answers3

3

Postgres supports COUNT(DISTINCT column_name), so if I have understood your request, using that instead of COUNT(*) will work, and you can drop the OVER.

Andrew Lazarus
  • 18,205
  • 3
  • 35
  • 53
2

What you could do is move the window function to a higher level select statement. This is because window function is evaluated before distinct on and limit clauses are applied. Also, you can not include DISTINCT keyword within window functions - it has not been implemented yet (as of Postgres 9.6).

 SELECT 
  *,
  COUNT(*) OVER() as total -- here
 FROM 
  (
    SELECT 
      DISTINCT ON (serial_no) id, 
      serial_no, 
      name, 
      timestamp
    FROM 
      product_info 
      INNER JOIN my.account ON id = accountid 
    WHERE 
      lower(name) = 'hello' 
    ORDER BY 
      serial_no, 
      timestamp DESC
    LIMIT 
      10
  ) AS my_info

Additionally, offset is not required there and one more sorting is also superfluous. I've removed these.

Another way would be to include a computed column in the select clause but this would not be as fast as it would require one more scan of the table. This is obviously assuming that your total is strictly connected to your resultset and not what's beyond that being stored in the table, but gets filtered out.

Kamil Gosciminski
  • 16,547
  • 8
  • 49
  • 72
  • Thank you. This did the trick if my limit is 10. And as you pointed out, I need the solution for "This is obviously assuming that your total is strictly connected to your result set and not what's beyond that being stored in the table, but gets filtered out" **That is** , I also have a requirement to show total unique entries available in the whole table. Though I just show 10 entries by limiting it to 10. – User850309 Jun 04 '18 at 05:48
0
select count(*), serial_no from product_info group by serial_no

will give you the number of duplicates for each serial number

The most mindless way of incorporating that information would be to join in a sub query

  SELECT 
  * 
  FROM 
  (
    SELECT 
      DISTINCT ON (serial_no) id, 
      serial_no, 
      name, 
      timestamp,
      COUNT(*) OVER() as total 
    FROM 
      product_info 
      INNER JOIN my.account ON id = accountid 
    WHERE 
      lower(name) = 'hello' 
    ORDER BY 
      serial_no, 
      timestamp DESC OFFSET 0 
    LIMIT 
      10
  ) AS my_info
  join (select count(*) as counts, serial_no from product_info group by serial_no) as X
  on X.serial_no = my_info.serial_no
 ORDER BY 
   serial_no asc
Eric Yang
  • 2,678
  • 1
  • 12
  • 18
  • Is there any way to edit my original query to get the total of unique serial_no. – User850309 Jun 01 '18 at 20:11
  • Thanks. I am able to get the count of each row of unique serial number. The count is for the 10 rows which I set as part of limit. Anyway to get the count of all unique serial number that exists in the table. – User850309 Jun 01 '18 at 20:55
  • As shown in the pic, 'total' has the overall count of the serial_no (including duplicates). So is there a possibility to get overall count of distinct serial_no in the column 'counts'. – User850309 Jun 01 '18 at 21:05