0

My data consists of annual tables for many workers (e.g. a file2006,file2007) incl. an employer ID as well as a firm ID. I also made a view to group firms by some calculations (groups below), which seem to complete fine so they are not shown. However, I thought I had the query to calculate the change in the total wage bill for each firm from 2006 to 2007 by these grouping using the code below. Instead of the blazing fast calculation of the previous views, this stalls the system for long, and finally breaks with some error message about mapping problems. What is the right way to do this then?

CREATE VIEW sys.change2007 (firmid,groups,wagebillchange) AS (
SELECT file2006.firmid,MAX(groups.groups), (
    SELECT SUM(file2007.wage)/SUM(file2006.wage) FROM file2006, file2007
    WHERE file2006.firmid = file2007.firmid
    )
FROM file2006, file2007, groups
WHERE file2006.firmid = file2007.firmid AND file2006.peorglopnr = groups.firmid
GROUP BY file2006.firmid
);
László
  • 3,914
  • 8
  • 34
  • 49

1 Answers1

2

You should never use ,s in the from clause. In your case, the problem is because you are doing a cartesian product of the employees in each firm from one year to the next. So, if a firm has 100 employees in each year, then you need to process 10,000 rows for that firm.

The solution is to pre-aggregate the results. You should be able to do this as:

SELECT f006.firmid, f2007.wage / f2006.wage as ratio
FROM (select firmid, sum(wage) as wage
      from file2006
      group by firmid
     ) f2006 join
     (select firmid, sum(wage) as wage
      from file2007
      group by firmid
     ) f2007
     on f2006.firmid = f007.firmid;

I cannot tell what the intention of the groups table is from your query, but you should be able to build on this.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks, but I get an error on this one telling me that the select identifier (firmid) is ambiguous… (And I would be interested in joining in the groups table as well, which is already aggregated to firmids, but I could not get it working, I always got `subquery table needs alias,` even though I was extending your example analogously.) – László Jun 12 '14 at 12:15
  • 1
    The `select` statement was wrong because the aggregation is done in the subqueries. – Gordon Linoff Jun 12 '14 at 12:22
  • This basically answers the original question indeed. Though I get a `division by zero` error if I simply ask for `SELECT AVG(ratio) FROM wagebillchange_2007` on a view created with the select, which I don't understand why it does not happen before (the division is in the view), and why isn't the division robust to producing NAs anyway? Oh, and I still get the `subquery reference table needs a reference` error if I try to join on the `groups` table too, but I can open a new question for that problem… – László Jun 12 '14 at 12:30