2

I'm building an OLAP Analysis with Pentaho's BI Suite (Community Edition). Many of my measures are standard deviations of the variables in my fact tables.

Does someone has a tip on how to define a Standard Deviation aggregation function in Schema Workbench? Lot's of my jobs could benefit of it.

Thanks in advance!

Lucas Soares
  • 117
  • 1
  • 2
  • 12

3 Answers3

2

You could use a MeasureExpression There is a guide on how to do this in Postgresql here, what is your underlying db?

http://blog.endpoint.com/2009/07/subverting-postgresql-aggregates-for.html

There has long been a request to support custom aggregators, it's not been done yet.

Codek
  • 5,114
  • 3
  • 24
  • 38
  • My underlying db is MS SQL Server 2005. I don't know if it supports overloading, I have never seen function overloading in a DB Engine before. – Lucas Soares Apr 11 '13 at 18:30
1

In my case the database has 3 mill rows, the MDX cube has 3124 cells.

So the MDX function would calculate the std dev from the 3124 cell values, whereas a "real" statistician usually would use all 3 mill rows.

To get the statisticians STDDEV, I added a column in the database, being the square of the row value.

Then in Mondrian I defined a new measure, the std dev, as :

square root of ( sum of squared values - (average value * average value))

This has some consequences for hierarchies, but that is another story.

HKL
  • 11
  • 1
0

How I'm calculating standard deviations now:

I created an ID dimension, not to explore, just to make sure that Mondrian isn't calculating Standard Deviation of values already aggregated.

Then I created a new Calculated Member using the MDX formula: Stddev(Descendants([ID_Dimension.ID_Hierarchy],,Leaves),[Measures].[Measure with values to be aggregated]).

Performance sucks.

The idea came from this very old forum post.

Lucas Soares
  • 117
  • 1
  • 2
  • 12