0

I've been implementing superset at work, and I like it so far. However, I have such a table:

name,age,gender
John,42,M
Sally,38,F
Patricia,27,F
Steven,29,M
Amanda,51,F

I want to define a new metric against each name, counting the number of people who are younger. My data is in a MySQLdatabase, and I suppose that for one person, I could write the query thus:

SELECT COUNT(DISTINCT name) from users where users.age <= 42;

for, say, John's row. So, how do I do this continuously for the entire table?

TylerH
  • 20,799
  • 66
  • 75
  • 101
stonecharioteer
  • 1,061
  • 11
  • 19
  • How many records do you have and expect to have in this table? The standard approach would be a self join with <= operator on age. On a large table this would result in a huge resultset. – Shadow Dec 21 '17 at 08:11
  • Right now, 7800 rows, and about 80 columns for each row. – stonecharioteer Dec 22 '17 at 05:01

1 Answers1

0

Your query could look something like

select *, 
  (select count(distinct all_users.name) from users all_users where all_users.age <= users.age)
FROM users

To shadow's point - this would get quite expensive to run on a large dataset.

If that were the case, you'd probably want to try putting an index on age, or denormalize that count altogether - the tradeoff being that inserts would become slower.

David Tobiano
  • 1,188
  • 8
  • 10