5

I want to count how many times each user has rows within '5' of eachother.

For example, Don - 501 and Don - 504 should be counted, while Don - 501 and Don - 1600 should not be counted.

Start:

Name        value
_________   ______________
Don         1235
Don         6012
Don         6014
Don         6300
James       9000
James       9502
James       9600
Sarah       1110
Sarah       1111
Sarah       1112
Sarah       1500
Becca       0500
Becca       0508
Becca       0709

Finish:

Name            difference_5
__________      _____________
Don             1
James           0
Sarah           2
Becca           0
Don P
  • 60,113
  • 114
  • 300
  • 432
  • 3
    maybe its my eyes but your data doesn't seem to match the text description.... _Don - 501 and Don - 504 should be counted_ but I don't see these values. – Taryn Aug 01 '12 at 16:20
  • Could you explain why Sarah's count is not 3? 1110 to 1111 is 1, 1111 to 1112 is 2, 1110 to 1112 is 3, right? Or is that not the way you are doing it? – mikeY Aug 01 '12 at 16:29

4 Answers4

2

Use the ABS() function, in conjunction with a self-join in a subquery:

So, something like:

SELECT name, COUNT(*) / 2 AS difference_5
FROM (
  SELECT a.name name, ABS(a.value - b.value) 
  FROM  tbl a JOIN tbl b USING(name)
  WHERE ABS(a.value - b.value) BETWEEN 1 AND 5
) AS t GROUP BY name

edited as per Andreas' comment.

Jeremy Smyth
  • 23,270
  • 2
  • 52
  • 65
  • I think this will include all permutations of a and b, that is, for "Don 6012" and "Don 6014" there would be two rows with difference 2. These would have to be filtered out somehow; maybe dividing by two? – Andreas Aug 01 '12 at 16:27
  • MySQL doesn't like the `difference` in the WHERE clause. (MySQL 5.5) – Holger Brandt Aug 01 '12 at 16:37
  • My bad. Fixed by adding backticks. – Jeremy Smyth Aug 01 '12 at 16:39
  • @JeremySmyth, Adding a backtick won't help. You can't reference an alias from the SELECT clause in the WHERE clause. You just need to move `ABS(a.value - b.value)` down to the WHERE clause and remove it completely from the SELECT clause. – Holger Brandt Aug 01 '12 at 16:47
  • 1
    oops. fixed again! I need a better SQL parser in my head. – Jeremy Smyth Aug 01 '12 at 16:47
1

Assuming that each name -> value pair is unique, this will get you the count of times the value is within 5 per name:

SELECT    a.name, 
          COUNT(b.name) / 2 AS difference_5
FROM      tbl a
LEFT JOIN tbl b ON a.name = b.name AND 
                   a.value <> b.value AND
                   ABS(a.value - b.value) <= 5
GROUP BY  a.name

As you'll notice, we also have to exclude the pairs that are equal to themselves.

But if you wanted to count the number of times each name's values came within 5 of any value in the table, you can use:

SELECT    a.name,
          COUNT(b.name) / 2 AS difference_5
FROM      tbl a
LEFT JOIN tbl b ON NOT (a.name = b.name AND a.value = b.value) AND
                   ABS(a.value - b.value) <= 5
GROUP BY  a.name

See the SQLFiddle Demo for both solutions.

Zane Bien
  • 22,685
  • 6
  • 45
  • 57
0

Because the OP also wants de zero counts, we'll need a self- left join. Extra logic is needed if one person has two exactly the same values, these should also be counted only once.

WITH cnts AS (
        WITH pair AS (
                SELECT t1.zname,t1.zvalue
                FROM ztable t1
                JOIN ztable t2
                ON t1.zname = t2.zname
                WHERE ( t1.zvalue < t2.zvalue
                        AND t1.zvalue >= t2.zvalue - 5 )
                OR (t1.zvalue = t2.zvalue AND t1.ctid < t2.ctid)
                )
        SELECT DISTINCT zname
        , COUNT(*) AS znumber
        FROM pair
        GROUP BY zname
        )
, names AS (
        SELECT distinct zname  AS zname
        FROM ztable
        GROUP BY zname
        )
SELECT n.zname
        , COALESCE(c.znumber,0) AS znumber
FROM names n
LEFT JOIN cnts c ON n.zname = c.zname
        ;

RESULT:

DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 14
 zname | znumber 
-------+---------
 Sarah |       3
 Don   |       1
 Becca |       0
 James |       0
(4 rows)

NOTE: sorry for the CTE, I had not seen th mysql tag,I just liked the problem ;-)

wildplasser
  • 43,142
  • 8
  • 66
  • 109
  • 2
    Does MySql support CTE? I didn't think that it had that functionality. – Taryn Aug 01 '12 at 17:00
  • I just noticed that myself. I still like the solution, though ... (and the problem is more or less general) – wildplasser Aug 01 '12 at 17:01
  • Would be even easier with windowing functions (`lag()` comes to mind) –  Aug 01 '12 at 17:04
  • Yes. The ploblem would even get nicer if you had to detect and count "clusters" (multiple values all with a distance < 5, eg {1,5,9,11}) I'd even go recursive, I think. BTW: it is a gaps-and-islands problem. Should be retagged. (maybe remove the mysql tag ? ;-) looks like homework anyway – wildplasser Aug 01 '12 at 17:08
  • Thanks for the help, it is actual work, not homework :P never heard of gaps-and-islands, maybe if I took CS this would be trivial :) – Don P Aug 01 '12 at 17:52
0
SELECT
    A.Name,
    SUM(CASE WHEN (A.Value < B.Value) AND (A.Value >= B.Value - 5) THEN 1 ELSE 0 END) Difference_5
FROM
    tbl A INNER JOIN
    tbl B USING(Name)
GROUP BY
    A.Name
Jazzy J
  • 302
  • 1
  • 15