1

The setup: I have energy use data from a bunch of buildings built in a bunch of different years. I'd like to analyze the energy use by date constructed in Tableau. My initial problem was that there were not enough buildings in the sample to have a robust set for each year, with the resulting outputs showing a ton of noise. The distribution is skewed right, as there are a number of high outliers, but no outliers close to 0, so I'd like to use a median to reduce the effect of the few (and likely erroneous) high outliers.

Desired solution: Create a 5-year "moving" or "running" median, that includes all the buildings within two years in either direction of the given year, so that each set is centered on the year.

What I tried in Tableau: I wanted to use WINDOW_MEDIAN([ENERGY],-2,2), but it is an aggregate function. So I tried WINDOW_MEDIAN(MEDIAN([ENERGY],-2,2). Unfortunately, this gives me the median of 5 medians (Median-of-Medians?! Boo!). Again, I'd like to have the median of all individual buildings (not aggregated medians) represented in each 5-year window.

Any thoughts of how to do this? Thanks!

Erik Lyon
  • 11
  • 1
  • 3

2 Answers2

1

Allow me to re-define a MOVING MEDIAN. Below is an example from MatLab:

A = [4 8 6 -1 -2 -3 -1 3 4 5]; M = movmedian(A,3)

The result is 10 medians.

To get the same in Tableau, you have 1-2 solutions:

  • MODEL_QUANTILE with a line, if your x-axis is a date.
  • If your x-axis is a measure, you need a dual-axis chart wherein the second visualization, the MOVING MEDIAN is based on the binned x-axis.

Franco

FrancoSwiss
  • 131
  • 3
0

When you define table calculations, you need to specify additional information beyond just the calculation formula itself -- to tell Tableau how to partition the result set, which dimensions to traverse (address) and in what order.

You see some of these choices under the "Compute Using" menu which is often all you need, but more explicitly by editing the table calculation and choosing Advanced for the "Running Along" option.

Your real issue is about defining the level of detail for the calculation you want.

Table calculations by definition operate on the aggregated results returned to Tableau by the data source -- unlike other calculations which are performed by the data source. That means that if you want your table calculation to compute the median considering each row in your data set, then Tableau has to request the data at a fine level of granularity. You can accomplish that by including some combination of dimensions in your view that uniquely identifies each row in your data (think primary key). That will expand the aggregated results given to Tableau to have all the data you need for the median calculation.

At that point, you will need to specify which dimensions to use for pertaining, addressing and ordering for your table calc, probably on the advanced panel mentioned above. There are some good tutorials on the net, but experimentation is a good teacher here too.

You will still need an aggregation function around [ENERGY], but since each block of data rows should have only row, then the aggregation function doesn't have much effect. I usually choose MIN() in those cases, but MAX(), AVG(), MEDIAN() all have the same effect when there is only one row in a partition.

Alex Blakemore
  • 11,301
  • 2
  • 26
  • 49