3

I am having some issues with the standard deviation function (stddev_samp in MonetDB specifically). I tried the following queries without success:

    select industry, avg(marketcap) as industryavg, stddev_samp(marketcap) as industrysd from cumulativeview group by industry
    select  stddev_samp(marketcap) as industrysd from cumulativeview group by industry

Each gives me a very weird exception and it seems the stddev function does not work on a group by subset, however using the avg function alone seems to work just fine on a group by subset as in the following query:

    select industry, avg(marketcap) as industryavg  from cumulativeview group by industry

And the standard deviation function works just fine when i use a where clause instead of a group by:

    select  stddev_samp(marketcap) as industrysd from cumulativeview where industry='Diversified Investments'

Is there an alternate way to write a query that would give me the average and standard deviation for each industry all at once rather than having to go through and write a seperate query for each industry? I am very confused as to why the average function works with group by and stddev does not...

Gadesxion
  • 391
  • 2
  • 6
  • 18
  • The error message from the first two queries: Error: TypeException:user.s14_4[656]:'aggr.substdev' undefined in: _996:bat[:any,:dbl] := aggr.substdev(_970:bat[:oid,:dbl], _956:bat[:oid,:oid], r1_956:bat[:oid,:oid], _972:bit) – Gadesxion Jan 20 '15 at 20:31
  • 1
    they said they fixed this bug. https://www.monetdb.org/bugzilla/show_bug.cgi?id=3257 could you please provide a *minimal* reproducible example? thanks! – Anthony Damico Jan 24 '15 at 02:00

1 Answers1

6

Just tested this with the Oct2014 release of MonetDB. From your query, I inferred the following table structure:

CREATE TABLE cumulativeview (industry string, company string, marketcap double);

Some sample data:

INSERT INTO cumulativeview VALUES ('Automotive', 'Daimler', 84784.62), 
('Automotive', 'BMW', 66852.15), ('Automotive', 'VW', 95378.54), ('Chemical', 'BASF', 70438.13), ('Chemical', 'Bayer', 105766.62);

And your query

SELECT industry, avg(marketcap) AS industryavg, stddev_samp(marketcap) AS industrysd FROM cumulativeview GROUP BY industry;

Results in

+------------+--------------------------+--------------------------+
| industry   | industryavg              | industrysd               |
+============+==========================+==========================+
| Automotive |       82338.436666666661 |       14419.659887918069 |
| Chemical   |                88102.375 |       24981.014848081126 |
+------------+--------------------------+--------------------------+

So as Anthony suggested, the bug seems to be fixed.

Hannes Mühleisen
  • 2,542
  • 11
  • 13