0

I have annual earnings (loneink) for 2007 for each individial ID (personlopnr), and I want to calculate for each company ID (peorglopnr) how much of the total wage bill was paid to worker born after 1980 (birth year is fodelsear), in aggregate. However, the code below produces a share column which is 0 for a vast majority of the cases and 1 for the rest. (To be clear, the code with WHERE loneink > 0 as below produces only 1s — the zero comeback without that condition and having a NULLIF to make sure I never divide by zero.) While there are many firms with no young workers, it is clearly not the case that all the other firms are young-only.

What is wrong here? This wasn't the way to generate a "young-wage" variable where for older workers earnings are zero, so the sum is only for the young? Or in theory this is OK, but I got the CASE WHEN wrong? Or the SUM/SUM misbehaves with GROUP BY?

What is a better way to do this?

CREATE VIEW sys.over26_2007 (personlopnr,peorglopnr,loneink,below26_loneink) AS (
SELECT personlopnr,peorglopnr,loneink, CASE WHEN fodelsear < 1981 THEN 0 ELSE loneink END AS below26_loneink
FROM sys.ds_chocker_lev_lisaindivid_2007 WHERE loneink > 0
);
SELECT COUNT(*) FROM over26_2007;

CREATE VIEW sys.share26_2007 (peorglopnr,share26_2007) AS (
SELECT peorglopnr, SUM(below26_loneink)/SUM(loneink)
FROM sys.over26_2007
WHERE loneink > 0
GROUP BY peorglopnr
);

My actual use case is in MonetDB, so hopefully we can stick to SQL:2003 solutions only, no mySQL or Oracle extensions.

László
  • 3,914
  • 8
  • 34
  • 49
  • have you tried to list the `SUM(below26_loneink)` and `SUM(loneink)` columns separately? It is maybe a case when the results of `SUM()` need to be casted to decimal type (or whatever it is called in MonetDB) and then divide – cha Jun 12 '14 at 23:06
  • @cha I did check `MIN`,`AVG` and `MAX` for both `loneink` and `below26_loneink` in `over26_2007`. That worked fine. Doesn't it mean that the `SUM` should not be a problem? Or what am I missing? – László Jun 12 '14 at 23:27
  • what column type is your `loneink`? – cha Jun 12 '14 at 23:32
  • @cha You mean in the view? How do I query that? MonetDB has no information_schema, and I cannot `SELECT over26_2007 FROM tables` as it is a view. – László Jun 12 '14 at 23:39
  • Never mind... Try this: `CAST(SUM(below26_loneink) AS double) / CAST(SUM(loneink) AS double)`. BTW, when you create the column, in your CASE statement use 0.0, like this: `CASE WHEN fodelsear < 1981 THEN 0.0 ELSE loneink END` – cha Jun 12 '14 at 23:47
  • @cha Thanks, this solved it! I am happy to accept an answer here if you write up something. – László Jun 12 '14 at 23:53

1 Answers1

1

First of all, when you create the view, you need to use 0.0 in your case statement. This will make sure that the column in the view is created using a correct data type (double in your case):

CREATE VIEW sys.over26_2007 (personlopnr,peorglopnr,loneink,below26_loneink) AS (
SELECT personlopnr,peorglopnr,loneink, 
       CASE WHEN fodelsear < 1981 THEN 0.0 ELSE loneink END AS below26_loneink
FROM sys.ds_chocker_lev_lisaindivid_2007 WHERE loneink > 0.0
);

Next, in your other view, CAST the sum to double as well:

CREATE VIEW sys.share26_2007 (peorglopnr,share26_2007) AS (
SELECT peorglopnr, CAST(SUM(below26_loneink) AS double) / CAST(SUM(loneink) AS double)
FROM sys.over26_2007
WHERE loneink > 0
GROUP BY peorglopnr
);
cha
  • 10,301
  • 1
  • 18
  • 26