3

We are using the following SQL query to produce monthly averages. The statement is heavy used and works pretty well, but it fails with a 'ORA-00979: not a GROUP BY expression' every month or two and we have no idea why.

First about the process:

  • we have raw data every few minutes,
  • => the raw data is getting averaged to hourly, daily, monthly and yearly values

raw -> hourly

  • creates entries in the averages table with average_type 2
  • never problems

hourly => daily / daily => monthly / monthly => yearly

  • Statements are pretty similar
  • averages of a 'lower' type is being averaged to a higher type
  • Average types are: 2 hourly, 3 daily, (4 weekly not used) 5 monthly and 6 yearly

  • The bug only appears with the step "daily => monthly".

Query:

  • We can't reproduce the bug, the next run of the aggregation job usually works without problems.
  • The errors occurs every 50-60 days, without a real pattern
  • Environment: Oracle 10g

Does anybody have an idea what the problem could be?

INSERT INTO averages
SELECT averages_seq.NEXTVAL,
       avg.*
FROM (
  SELECT
      m.city_id,            m.city_name,
      m.state_id,           m.state_name,
      m.district_id,        m.district_name,
      m.country_id,         m.country_name,
      m.currency_id,        m.currency_name,
      m.category_id,        m.category_name,
      5 average_type, -- average_type 5 ==> monthly average
      0 analysis_type,
      TRUNC(m.average_date, 'MM')  average_date,
      AVG(m.value) value,
      SUM(m.sum) sum,
      NULL uncertainty,
      NULL uncertainty_type,
      MIN(m.value_min) value_min,
      MAX(m.value_max) value_max,
      SUM(number_of_measurements) number_of_measurements,
      -- 6 * 24 => measurements per day
      -- (ADD_MONTHS(...)) => days per month 
      100 * SUM(number_of_measurements) / 
           (6 * 24 *
           (ADD_MONTHS(TRUNC(average_date, 'MM'), 1)  - TRUNC(average_date, 'MM'))) coverage_percent,
      SUM(customers) customers,
      NULL dummy_field,
      CURRENT_TIMESTAMP calculation_date,
      CURRENT_TIMESTAMP creation_date,
      'AGGREGATION' creation_user,
      CURRENT_TIMESTAMP modification_date,
      'AGGREGATION' modification_user,
      'n' constant_1,
      3   constant_2,
      -1 average_state
  FROM averages m
  WHERE   m.average_type = 3 -- average type 3 ==> daily average
  AND     m.average_date
      BETWEEN
        TO_TIMESTAMP('2011-06-01T00:00:00Z', 'YYYY-MM-DD"T"HH24:MI:SS"Z"')
        AND
        TO_TIMESTAMP('2011-06-30T23:59:59Z', 'YYYY-MM-DD"T"HH24:MI:SS"Z"')
  AND     m.analysis_type = 0
  GROUP BY
        m.city_id,            m.city_name,
        m.state_id,           m.state_name,
        m.district_id,        m.district_name,
        m.country_id,         m.country_name,
        m.currency_id,        m.currency_name,
        m.category_id,        m.category_name,
        TRUNC(m.average_date, 'MM')
  ) avg
reto
  • 16,189
  • 7
  • 53
  • 67
  • 1
    I deleted my answer, since docs say sysdate always returns same DATE value in a single SQL statement context (http://oracle.su/docs/11g/timesten.112/e13070/ttsql247.htm). – Gerrat Aug 04 '11 at 18:06
  • I'm wondering if there is ever a case where one of the columns in the GROUP BY section could ever be null. That's the only thing I can think of. – Narnian Aug 04 '11 at 18:13
  • @Gerrat: thanks for the idea, and even thank you more for checking that it cant be the problem :)! Narnian: I'm not sure how null could be a problem? – reto Aug 04 '11 at 18:41
  • @reto/@Narnian...I doubt null could be a problem...you can group by a column containing nulls (it's just a separate group) – Gerrat Aug 04 '11 at 19:23
  • The intersting thing about the ORA-00979 error (or rather the GROUP BY expressions) is that they are not checked during compile-time but at run-time if they are part of a PL/SQL procedure. (Try it with SELECT A, B FROM T GROUP BY A). NULLs, Dates or any table contents are hardly the problem as the correctness only depends on the table definition. It looks as if it's either an Oracle bug or your AGGREGATES table is changed (DDL) between runs of the query. Are you rebuilding indices, adding or dropping partitions, enabling or disabling constraints? – Codo Aug 04 '11 at 21:12
  • @CODO: its just plain sql executed over jdbc, nothing fancy. No DDL, at least nothing explicit (dont know if oracle rebuilds indices internally, but I doubt that you meant things like that). – reto Aug 05 '11 at 07:41

2 Answers2

1

I would add a group by:

(ADD_MONTHS(TRUNC(average_date, 'MM'), 1)  - TRUNC(average_date, 'MM'))

I know this can't change without TRUNC(m.average_date, 'MM') changing, but it seems to be the only non-aggregate column not in your GROUP BY.

In addition, you could remove all the non-aggregated, constant, columns outside the inner sql, and explicitly name the columns you're inserting and select these constants at the same time:

eg.

INSERT INTO averages(city_id, city_name, ...average_type, analysis_type, ...)
SELECT averages_seq.NEXTVAL,
avg.city_id, avg.city_name, ...
5, 0, ...

...can't say this part will fix the issue, but it would certainly remove them from suspicion of not being aggregated.

Gerrat
  • 28,863
  • 9
  • 73
  • 101
  • If I could reproduce the issue this would be a no brainer. But i'll try it if nothing else comes up! – reto Aug 04 '11 at 18:43
  • I don't see why this answers gets upvoted. If `TRUNC(average_date, 'MM')` is a group by expression, so are all expressions building upon it without introducing any new non-aggregated or non-grouped columns. The proposed solutions are not just guesswork (which is sometimes needed to find a bug), they are against the logic of SQL's GROUP BY clause. – Codo Aug 05 '11 at 12:12
  • I moved all the non-aggregating parts out of the inner query. We'll se if it fixes it. – reto Aug 05 '11 at 16:52
  • So far the error didnt reappear. Moving all the non-aggregating stuff to the outer query helped. Thanks for the help! – reto Sep 02 '11 at 10:14
  • Okay, by now it s clear, that query rewriting solved 'ORA-00979', the error didn't reappear. – reto Sep 30 '11 at 06:54
0

Judging by the comparison, the average_date is a timestamp with time zone (local time zone ?), but TRUNC works on a date. I'm wondering what happens if there is some oddity where a selected date is 'jumping' from one month to another (eg it occurred in January in one timezone but Feb in another).

Based on that, also consider whether the client is having an effect, (eg maybe it errors when run from a client that is in a different timezone from the database setting).

I'd expand Gerrat's suggestion of specifying the column names so that you can separate out the constants

INSERT INTO averages
  (average_type, analysis_type, uncertainty, uncertainty_type,
  dummy_field, calculation_date, creation_date, creation_user, 
  modification_date, modification_user, constant_1, constant_2,
   ....
SELECT averages_seq.NEXTVAL,
      5 average_type, -- average_type 5 ==> monthly average
      0 analysis_type,
      NULL uncertainty,
      NULL uncertainty_type,
      NULL dummy_field,
      CURRENT_TIMESTAMP calculation_date,
      CURRENT_TIMESTAMP creation_date,
      'AGGREGATION' creation_user,
      CURRENT_TIMESTAMP modification_date,
      'AGGREGATION' modification_user,
      'n' constant_1,
      3   constant_2,
      -1 average_state
       avg.*
FROM (
  SELECT ...
Gary Myers
  • 34,963
  • 3
  • 49
  • 74
  • All the data is utc. There's only one client which sets timezone correctly. The job works without problems when rerun. Thanks for your thoughts! – reto Aug 17 '11 at 08:23