0

I am trying to bin my data to see how it is distributed. Some bins contain no data. So, when I query my bins, I only get the bins containing data. I would like the query to also show empty bins.

I was trying to create a table containing all the bins to latter do a LEFT JOIN with the query containing data from my table. But I still only get bins with data. The query does not show empty bins. This is the code I was using:

WITH tb AS 
(
  SELECT 'A [+0 min-1 min]' AS bins UNION ALL
  SELECT 'B [+1 min-113 min]'       UNION ALL
  SELECT 'C [+113 min-223 min]'     UNION ALL
  SELECT 'D [+223 min-335 min]'     UNION ALL
  SELECT 'E [+335 min-447 min]'     UNION ALL
  SELECT 'F [+447 min-559 min]'     UNION ALL
  SELECT 'G [+559 min-671 min]'     UNION ALL
  SELECT 'H [+671 min-783 min]'     UNION ALL
  SELECT 'I [+783 min-895 min]'     UNION ALL
  SELECT 'J [+895 min-1007 min]'    UNION ALL
  SELECT 'K [+1007 min-1119 min]'   UNION ALL
  SELECT 'L [+1119 min-1231 min]'   UNION ALL
  SELECT 'M [+1231 min-1343 min]'   UNION ALL
  SELECT 'N [+1343 min-1440 min]'   UNION ALL
  SELECT 'O [+1140 min]'
), aa AS 
(
  SELECT CASE
         WHEN ride_length BETWEEN '00:00:00' AND '00:01:00' THEN
          'A [+0 min-1 min]'
         WHEN ride_length BETWEEN '00:01:01' AND '01:53:00' THEN
          'B [+1 min-113 min]'
         WHEN ride_length BETWEEN '01:53:01' AND '03:43:00' THEN
          'C [+113 min-223 min]'
         WHEN ride_length BETWEEN '03:43:01' AND '05:35:00' THEN
          'D [+223 min-335 min]'
         WHEN ride_length BETWEEN '05:35:01' AND '07:27:00' THEN
          'E [+335 min-447 min]'
         WHEN ride_length BETWEEN '07:27:01' AND '09:19:00' THEN
          'F [+447 min-559 min]'
         WHEN ride_length BETWEEN '09:19:01' AND '11:11:00' THEN
          'G [+559 min-671 min]'
         WHEN ride_length BETWEEN '11:11:01' AND '13:03:00' THEN
          'H [+671 min-783 min]'
         WHEN ride_length BETWEEN '13:03:01' AND '14:55:00' THEN
          'I [+783 min-895 min]'
         WHEN ride_length BETWEEN '14:55:01' AND '16:47:00' THEN
          'J [+895 min-1007 min]'
         WHEN ride_length BETWEEN '16:47:01' AND '18:39:00' THEN
          'K [+1007 min-1119 min]'
         WHEN ride_length BETWEEN '18:39:01' AND '20:31:00' THEN
          'L [+1119 min-1231 min]'
         WHEN ride_length BETWEEN '20:31:01' AND '22:23:00' THEN
          'M [+1231 min-1343 min]'
         WHEN ride_length BETWEEN '22:33:01' AND '24:00:00' THEN
          'N [+1343 min-1440 min]'
         ELSE
          'O [+1140 min]'
         END AS bins,
         member_casual,
         rideable_type
    FROM (SELECT TIMEDIFF(ended_at, started_at) AS ride_length,
                 member_casual,
                 rideable_type
            FROM cyclistic) a
)
SELECT tb.bins,
       aa.member_casual,
       aa.rideable_type,
       IFNULL(COUNT(member_casual), 0)
  FROM tb
  LEFT JOIN aa
    ON tb.bins = aa.bins
 GROUP BY tb.bins, aa.member_casual, aa.rideable_type;

For some reason, the output I get is the following:

+------------------------+---------------+---------------+--------------------------------+
| bins                   | member_casual | rideable_type | IFNULL(COUNT(member_casual),0) |
+------------------------+---------------+---------------+--------------------------------+
       | classic_bike  |                          25999 |
       | electric_bike |                          47882 |
       | classic_bike  |                          12892 |
       | electric_bike |                          33882 |
       | electric_bike |                        1213471 |
       | electric_bike |                        1584591 |
       | classic_bike  |                        1679971 |
       | classic_bike  |                         857810 |
       | docked_bike   |                         158858 |
       | classic_bike  |                          15025 |
       | electric_bike |                           2426 |
       | docked_bike   |                          11824 |
       | classic_bike  |                           1661 |
       | electric_bike |                            635 |
       | classic_bike  |                            400 |
       | electric_bike |                            184 |
       | classic_bike  |                             84 |
       | classic_bike  |                             38 |
       | docked_bike   |                            195 |
       | docked_bike   |                           1554 |
       | classic_bike  |                           1213 |
       | electric_bike |                            323 |
       | docked_bike   |                            139 |
       | classic_bike  |                             25 |
       | classic_bike  |                             42 |
       | docked_bike   |                            263 |
       | classic_bike  |                            111 |
       | classic_bike  |                            213 |
       | docked_bike   |                            192 |
       | classic_bike  |                            159 |
       | classic_bike  |                            113 |
       | electric_bike |                           5263 |
       | classic_bike  |                            176 |
       | classic_bike  |                            400 |
       | electric_bike |                             64 |
       | docked_bike   |                            443 |
       | docked_bike   |                            169 |
       | classic_bike  |                            131 |
       | classic_bike  |                            194 |
       | classic_bike  |                            275 |
       | docked_bike   |                            207 |
       | classic_bike  |                            129 |
       | electric_bike |                              1 |
       | docked_bike   |                            152 |
       | classic_bike  |                             80 |
       | classic_bike  |                            126 |
       | classic_bike  |                           2606 |
       | docked_bike   |                           1507 |
       | classic_bike  |                            716 |
       | docked_bike   |                            201 |
       | classic_bike  |                             94 |
       | classic_bike  |                             47 |
       | docked_bike   |                           1528 |
       | electric_bike |                            168 |
       | docked_bike   |                            229 |
       | classic_bike  |                            137 |
       | classic_bike  |                            287 |
       | electric_bike |                             51 |
+------------------------+---------------+---------------+--------------------------------+

I also tried COALESCE, but I got the same output.

Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55

1 Answers1

0

Just define your bins in one place and then use that to LEFT JOIN to cyclistic:

WITH bins (bin, start, end) AS (
  SELECT 'A [+0 min-1 min]',       '00:00:00', '00:01:00' UNION ALL
  SELECT 'B [+1 min-113 min]',     '00:01:01', '01:53:00' UNION ALL
  SELECT 'C [+113 min-223 min]',   '01:53:01', '03:43:00' UNION ALL
  SELECT 'D [+223 min-335 min]',   '03:43:01', '05:35:00' UNION ALL
  SELECT 'E [+335 min-447 min]',   '05:35:01', '07:27:00' UNION ALL
  SELECT 'F [+447 min-559 min]',   '07:27:01', '09:19:00' UNION ALL
  SELECT 'G [+559 min-671 min]',   '09:19:01', '11:11:00' UNION ALL
  SELECT 'H [+671 min-783 min]',   '11:11:01', '13:03:00' UNION ALL
  SELECT 'I [+783 min-895 min]',   '13:03:01', '14:55:00' UNION ALL
  SELECT 'J [+895 min-1007 min]',  '14:55:01', '16:47:00' UNION ALL
  SELECT 'K [+1007 min-1119 min]', '16:47:01', '18:39:00' UNION ALL
  SELECT 'L [+1119 min-1231 min]', '18:39:01', '20:31:00' UNION ALL
  SELECT 'M [+1231 min-1343 min]', '20:31:01', '22:23:00' UNION ALL
  SELECT 'N [+1343 min-1440 min]', '22:23:01', '24:00:00' UNION ALL
  SELECT 'O [+1140 min]',          '24:00:01', NULL
)
SELECT bins.bin,
       c.member_casual,
       c.rideable_type,
       COUNT(c.id)
  FROM bins
  LEFT JOIN cyclistic c
    ON TIMEDIFF(c.ended_at, c.started_at) BETWEEN bins.start AND bins.end
    OR (bins.end IS NULL AND TIMEDIFF(ended_at, started_at) >= bins.start)
 GROUP BY bins.bin, c.member_casual, c.rideable_type;

With the following CTE to represent your cyclistic table:

WITH cyclistic (id, member_casual, rideable_type, started_at, ended_at) AS (
    SELECT 1, 1, 'classic_bike', '2023-02-11 00:00:00', '2023-02-11 00:00:01' UNION ALL
    SELECT 2, 2, 'classic_bike', '2023-02-11 00:00:00', '2023-02-11 00:00:01' UNION ALL
    SELECT 3, 1, 'classic_bike', '2023-02-11 00:00:00', '2023-02-12 00:00:01'
)

the query returns:

bin member_casual rideable_type COUNT(c.id)
A [+0 min-1 min] 2 classic_bike 1
A [+0 min-1 min] 1 classic_bike 1
B [+1 min-113 min] 0
C [+113 min-223 min] 0
D [+223 min-335 min] 0
E [+335 min-447 min] 0
F [+447 min-559 min] 0
G [+559 min-671 min] 0
H [+671 min-783 min] 0
I [+783 min-895 min] 0
J [+895 min-1007 min] 0
K [+1007 min-1119 min] 0
L [+1119 min-1231 min] 0
M [+1231 min-1343 min] 0
N [+1343 min-1440 min] 0
O [+1140 min] 1 classic_bike 1

Obviously, this is a very inefficient query and you should be cautious if running this against a live OLTP database.

user1191247
  • 10,808
  • 2
  • 22
  • 32