How do I define a Standard Deviation function in Pentaho Schema Workbench

Question

I'm building an OLAP Analysis with Pentaho's BI Suite (Community Edition). Many of my measures are standard deviations of the variables in my fact tables.

Does someone has a tip on how to define a Standard Deviation aggregation function in Schema Workbench? Lot's of my jobs could benefit of it.

Thanks in advance!

score 2 · Answer 1 · answered Apr 11 '13 at 06:59

2

You could use a MeasureExpression There is a guide on how to do this in Postgresql here, what is your underlying db?

http://blog.endpoint.com/2009/07/subverting-postgresql-aggregates-for.html

There has long been a request to support custom aggregators, it's not been done yet.

answered Apr 11 '13 at 06:59

Codek

5,114
3
24
38

My underlying db is MS SQL Server 2005. I don't know if it supports overloading, I have never seen function overloading in a DB Engine before. – Lucas Soares Apr 11 '13 at 18:30

score 1 · Answer 2 · answered Dec 01 '16 at 14:45

In my case the database has 3 mill rows, the MDX cube has 3124 cells.

So the MDX function would calculate the std dev from the 3124 cell values, whereas a "real" statistician usually would use all 3 mill rows.

To get the statisticians STDDEV, I added a column in the database, being the square of the row value.

Then in Mondrian I defined a new measure, the std dev, as :

square root of ( sum of squared values - (average value * average value))

This has some consequences for hierarchies, but that is another story.

score 0 · Accepted Answer · answered Apr 11 '13 at 18:37

How I'm calculating standard deviations now:

I created an ID dimension, not to explore, just to make sure that Mondrian isn't calculating Standard Deviation of values already aggregated.

Then I created a new Calculated Member using the MDX formula: Stddev(Descendants([ID_Dimension.ID_Hierarchy],,Leaves),[Measures].[Measure with values to be aggregated]).

Performance sucks.

The idea came from this very old forum post.

How do I define a Standard Deviation function in Pentaho Schema Workbench

3 Answers3