1

I'm looking for a way to group results by ranges.

For example:

Data:

ID | Value
1  |     5
2  |    10 
3  |    30
4  |    44
5  |    71 

Result:

|From  | To  | Count(*) |
|    0 |  20 |        2 |
|   20 |  40 |        1 |
|   40 |  60 |        0 |
|   60 |  80 |        1 |

I know I could easily do this with a case or where query however I have several tables with an enormous variation in ranges (which also include decimal values) so I was hoping for a solution which would automatically create ranges of 20 in between the minimum and maximum values.

Two solutions which I have found but were unable to adapt:

group by price range (Very similar but it creates ranges from values so there will never be 0 ranges and doesn't do whole numbers)

Group rows by Year Band Interval (I'm not familiar with the use of floor but this is what I'm trying to accomplish, but I couldn't get it to accurately count the rows)

Thank you so much for your help

James

Community
  • 1
  • 1
JamesK
  • 13
  • 4
  • So the range might just as easily run from 0 to 20, or from 1000 to 6,000,000,000, say? (This would be much simpler in application-level code) – Strawberry Jun 03 '14 at 12:36
  • Thanks strawberry unfortunately I'm not running on an application level, the data is POS derived data from a very large data source and I'm only running data analysis on it. That range wont be that large however I have 148 tables and I have to do this for each all with varying ranges (more like -500 to 100,000 (MAX) so I could do it with a case query but I thought if a dynamic answer was easy enough I would ask. PS. Thanks for the formatting assistance DalmTo! – JamesK Jun 03 '14 at 13:10

2 Answers2

1

It is possible to do, although I make no positive guarantees about performance.

You need to find the upper and lower boundaries. This can be done by getting the max and min values, dividing each by 20, using floor / ceil to round them then multiplying by 20. Then cross join that with a sub query to generate a range of numbers (in 20s). Make sure this sub query gets a big enough range (in the example below it is generating 1000 ranges, so from 0 to 20000 max)

This will get you all the ranges. You can left join the actual data to count the actual number of values in each of those ranges.

Note that your example ranges have gone 1 to 20 and then 20 to 40. Thus any value of 20 would appear in 2 ranges. I have assumed 0 up to but not including 20 . If the values are integers you could generate ranges of 0-19, 20-39, etc.

SELECT sub1.lower_boundary, sub1.upper_boundary, COUNT(value)
FROM
(
    SELECT FLOOR(MIN(value) / 20) * 20 AS min_boundary, CEIL(MAX(value) / 20) * 20 AS max_boundary
    FROM sometable
) sub0
CROSS JOIN
(
    SELECT (units.i + 10 * tens.i + 100 * hundreds.i) * 20 AS lower_boundary,(1 + units.i + 10 * tens.i + 100 * hundreds.i) * 20 AS upper_boundary
    FROM
    (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units,
    (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens,
    (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) hundreds
) sub1
LEFT OUTER JOIN sometable a
ON a.Value >= sub1.lower_boundary
AND a.Value < sub1.upper_boundary
WHERE sub1.lower_boundary < max_boundary
AND  sub1.upper_boundary > min_boundary
GROUP BY sub1.lower_boundary, sub1.upper_boundary

SQL fiddle here:-

http://www.sqlfiddle.com/#!2/cd535/3

Kickstart
  • 21,403
  • 2
  • 21
  • 33
1

Well, just thinking out loud...

SELECT * FROM ints;
+---+
| i |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+---+

DROP TABLE IF EXISTS my_table;

CREATE TABLE my_table
(ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,Value INT NOT NULL
);

INSERT INTO my_table VALUES
(1  ,5),
(2  ,10),
(3  ,30),
(4  ,44),
(5  ,81),
(6  ,200),
(7, -200); 

SELECT a.*
     , COUNT(DISTINCT c.id) total
  FROM
     ( SELECT (i4.i*1000 + i3.i*100 + i2.i*10 + i1.i - 1000) * 20 `from`
            , ((i4.i*1000 + i3.i*100 + i2.i*10 + i1.i -1000) * 20) + 20 `to`
         FROM ints i1,ints i2, ints i3, ints i4
     ) a
  JOIN 
     ( SELECT MIN(value) min_val,MAX(value) max_val FROM my_table ) b
    ON b.max_val > a.from
   AND b.min_val < a.to
  LEFT
  JOIN my_table c
    ON c.value BETWEEN a.from AND a.to
 GROUP 
    BY a.from;
    +------+------+-------+
| from | to   | total |
+------+------+-------+
| -200 | -180 |     1 |
| -180 | -160 |     0 |
| -160 | -140 |     0 |
| -140 | -120 |     0 |
| -120 | -100 |     0 |
| -100 |  -80 |     0 |
|  -80 |  -60 |     0 |
|  -60 |  -40 |     0 |
|  -40 |  -20 |     0 |
|  -20 |    0 |     0 |
|    0 |   20 |     2 |
|   20 |   40 |     1 |
|   40 |   60 |     1 |
|   60 |   80 |     0 |
|   80 |  100 |     1 |
|  100 |  120 |     0 |
|  120 |  140 |     0 |
|  140 |  160 |     0 |
|  160 |  180 |     0 |
|  180 |  200 |     1 |
    +------+------+-------+
Strawberry
  • 33,750
  • 13
  • 40
  • 57