0

I have table users with the following values:

  id   |     created_at
-------+---------------------
 20127 | 2015-01-31 04:23:46
 21468 | 2015-02-04 07:50:34
 21571 | 2015-02-04 08:23:50
 20730 | 2015-03-12 10:20:16
 19955 | 2015-03-30 07:44:35
 20148 | 2015-04-17 13:03:26
 21552 | 2015-05-07 19:00:00
 20145 | 2015-06-02 03:12:46
 21467 | 2015-06-03 13:21:51
 21074 | 2015-07-03 19:00:00

I want to:

  • find the cumulative sum for number of users over time (return count of users for every day in the date range, not just for the days that exist in the database)
  • be able to filter that sum by date, so if I put the date that is after some row, that row should be included in the cumulative sum (everything before the range specified should be included in the first sum, it shouldn't start counting from 0 at the beginning of the range specified)
  • return results grouped by each day in epoch format

I'm trying to achieve this with the following SQL:

    SELECT extract(epoch from created_at)::bigint, 
           sum(count(id)::integer) OVER (ORDER BY created_at)
    FROM data_users
    WHERE created_at IS NOT NULL
    GROUP BY created_at

But it's not working as expected since I can't add filtering by date here, without excluding records from the cumulative sum. Also it doesn't take into account days that have been missed (those for which the users don't exist).

Any help greatly appreciated.

shime
  • 8,746
  • 1
  • 30
  • 51

1 Answers1

1

As far as I understand your question a simple query with GROUP BY should be enough. You can use a left outer join with GENERATE_SERIES() to get all dates in the range. If you have the start and end date of the range, you can use this:

SELECT EXTRACT(EPOCH FROM d)::BIGINT, COALESCE(COUNT(u.id), 0)
FROM GENERATE_SERIES(start, end, '1 DAY'::INTERVAL) d 
    LEFT OUTER JOIN data_users u ON u.created_at::DATE = d
GROUP BY 1 ORDER BY 1

You can determine start and end from your table, too:

SELECT EXTRACT(EPOCH FROM d.date)::BIGINT, COALESCE(COUNT(u.id), 0)
FROM
    (SELECT GENERATE_SERIES(MIN(created_at)::DATE, MAX(created_at)::DATE, '1 DAY'::INTERVAL) AS date
    FROM data_users) d
    LEFT OUTER JOIN data_users u ON u.created_at::DATE = d.date::DATE
GROUP BY 1 ORDER BY 1;

This returns:

 date_part  | coalesce 
------------+----------
 1422662400 |        1
 1422748800 |        0
 1422835200 |        0
 1422921600 |        0
 1423008000 |        2
 1423094400 |        0
 1423180800 |        0
...
 1435536000 |        0
 1435622400 |        0
 1435708800 |        0
 1435795200 |        0
 1435881600 |        1

With this query you can get the cumulative sum for the rows before a start date:

SELECT EXTRACT(EPOCH FROM GREATEST(d.date, start))::BIGINT, COALESCE(COUNT(u.id), 0)
FROM
    (SELECT GENERATE_SERIES(MIN(created_at)::DATE, MAX(created_at)::DATE, '1 DAY'::INTERVAL) AS date
    FROM data_users) d
    LEFT OUTER JOIN data_users u ON u.created_at::DATE = d.date::DATE
GROUP BY 1 ORDER BY 1;
clemens
  • 16,716
  • 11
  • 50
  • 65
  • The second query throws an error `operator does not exist: date = record` for `u.created_at::DATE = d`. The first query helps, thanks. – shime Dec 22 '17 at 17:13
  • Sorry, I could not test the query. I've fixed it, and now it works. – clemens Dec 22 '17 at 17:52
  • Hmm, I'm still not sure how to join these dates with my query, I need a cumulative sum, not count. – shime Dec 22 '17 at 18:38
  • Figured it out, thanks! Now the only remaining question is getting cumulative sum for all rows that were before the start date for the filter. – shime Dec 22 '17 at 18:40
  • I've added a third query for that. – clemens Dec 22 '17 at 21:15
  • Thanks, accepted. Is there a way the interval could be changed to say one month and still preserve the cumulative sum from all the days? – shime Dec 23 '17 at 10:51
  • Figured it out, I should just use `DATE_TRUNC`. Thank you, this is great! :D – shime Dec 23 '17 at 11:00