2

Very new to DAX/PowerPivot, and faced with devilishly tricky question on day one.

I have some data (90,000 rows) I'm trying to use to calculate a cumulative fatigue score for folk working shifts(using PowerPivot/Excel 2016). As per the below screenshot, the dataset is shift data for multiple employees, that has a cumulative count of days worked vs. days off that resets back to 1 whenever they switch from one state to the other, and a 'Score' column that in my production data contains a measure of how fatigued they are.

I would like to cumulatively sum that fatigue score, and reset it whenever they move between the 'Days worked' and 'Days off' states. My desired output is in the 'Desired' column far right, and I've used green highlighting to show days worked vs. days off as well as put a bold border around separate Emp_ID blocks to help demonstrate the data.

enter image description here

There is some similarity between my question and the SO post at DAX running total (or count) across 2 groups except that one of my columns (i.e. the Cumulative Days one) is in a repeating sequence from 1 to x. And Javier Guillén's post would probably make a good starting point if I'd had a couple of months of DAX under my belt, rather than the couple of hours I've gained today.

I can barely begin to conceptualize what the DAX would need to look like, given I'm a DAX newbie (my background is VBA, SQL, and Excel formulas). But lest someone berate me for not even providing a starting point, I tried to tweak the following DAX without really having a clue what I was doing:

 Cumulative:=CALCULATE( 
SUM( Shifts[Score] ) , 
FILTER(Shifts,Shifts[Cumulative Days] <= VALUES(Shifts[Cumulative Days] )) , 
ALLEXCEPT( shifts, Shifts[Workday],Shifts[EMP_ID] ) )

Now I'll be the first to admit that this code is DAX equivelant of the Infinite Monkey Theorem. And alas, I have no bananas today, and my only hope is that someone finds this problem suitably a-peeling.

Community
  • 1
  • 1
jeffreyweir
  • 4,668
  • 1
  • 16
  • 27

1 Answers1

1

The problem with this table is there is no way to determine when stop summing while performing the cumulative total.

I think one way to achive it could be calculating the next first date where continuous workday status changes.

For example the workday status in the first three rows for EMP_ID 70073 are the same, until the fourth row, date 04-May which is the date the workday status changes. My idea is to create a calculated column that find the status change date for each workday serie. That column lets us implement the cumulative sum.

Below is the expression for the calculated column I named Helper.

Helper =
IF (
    ISBLANK (
        CALCULATE (
            MIN ( [Date] ),
            FILTER (
                'Shifts',
                'Shifts'[EMP_ID] = EARLIER ( 'Shifts'[EMP_ID] )
                    && 'Shifts'[Workday] <> EARLIER ( 'Shifts'[Workday] )
                    && [Date] > EARLIER ( 'Shifts'[Date] )
            )
        )
    ),
    CALCULATE (
        MAX ( [Date] ),
        FILTER (
            Shifts,
            Shifts[Date] >= EARLIER ( Shifts[Date] )
                && Shifts[EMP_ID] = EARLIER ( Shifts[EMP_ID] )
        )
    )
        + 1,
    CALCULATE (
        MIN ( [Date] ),
        FILTER (
            'Shifts',
            'Shifts'[EMP_ID] = EARLIER ( 'Shifts'[EMP_ID] )
                && 'Shifts'[Workday] <> EARLIER ( 'Shifts'[Workday] )
                && [Date] > EARLIER ( 'Shifts'[Date] )
        )
    )
)

In short, the expression says if the date calculation for the current workday series change returns a blank use the last date for that EMP_ID ading one date.

Note there is no way to calculate the change date for the last workday serie, in this case 08-May rows, so if the the calculation returns blank it means it is being evaluated in the last serie then my expression should return the max date for that EMP_ID adding one day.

Once the calculated column is in the table you can use the following expression to create a measure for the cumulative value:

Cumulative Score =
CALCULATE (
    SUM ( 'Shifts'[Score] ),
    FILTER ( ALL ( 'Shifts'[Helper] ), [Helper] = MAX ( [Helper] ) ),
    FILTER ( ALL ( 'Shifts'[Date] ), [Date] <= MAX ( [Date] ) )
)

In a table in Power BI (I have no access to PowerPivot at least eight hours) the result is this:

enter image description here

I think there is an easier solution, my first thought was using a variable, but that is only supported in DAX 2015, it is quite possible you are not using Excel 2016.

UPDATE: Leaving only one filter in the measure calculation. FILTER are iterators through the entire table, so using only one filter and logic operators could be more performant.

Cumulative Score =
CALCULATE (
    SUM ( 'Shifts'[Score] ),
    FILTER (
        ALL ( 'Shifts'[Helper], Shifts[Date] ),
        [Helper] = MAX ( [Helper] )
            && [Date] <= MAX ( [Date] )
    )
)

UPDATE 2: Solution for pivot tables (matrix), since previous expression worked only for a tabular visualization. Also measure expression was optimized to implement only one filter.

This should be the final expression for pivot table:

Cumulative Score =
CALCULATE (
    SUM ( 'Shifts'[Score] ),
    FILTER (
        ALLSELECTED ( Shifts ),
        [Helper] = MAX ( [Helper] )
            && [EMP_ID] = MAX ( Shifts[EMP_ID] )
            && [Date] <= MAX ( Shifts[Date] )
    )
)

Note: If you want to ignore filters use ALL instead of ALLSELECTED.

Results in Power BI Matrix:

enter image description here

Results in PowerPivot Pivot Table:

enter image description here

Let me know if this helps.

alejandro zuleta
  • 13,962
  • 3
  • 28
  • 48
  • Hi Alejondro. Thanks for the very well written response. I neglected to say that I'm using Excel 2016, and that my dataset is 90,000 records. The calculated column works perfectly, but perhaps due to the size of the data and the fairly low-spec machine I'm running it on, my Excel instance has been saying "Reading data..." for quite some time when I added the measure to the Pivot. I'll let you know how long it takes, assuming it finishes. And I'll try it on a smaller subset of data. My other option is to calculate the cumulative figures using an Excel Table, and upload to PowerPIvot. – jeffreyweir Oct 31 '16 at 21:37
  • I had to cancel the refresh, and try again on a subset of the data. But with both my production data and this sample data I'm getting a different result than your output: the Cumulative Score field returns the exact same thing as the Score field. So there must be some inconsistency between PowerBI and PowerPivot, unless I've forgotten to tweak some setting. When you next have access to PowerPivot I'd be interested to see if it works for you there, or whether you get the same result in the PP platform that I do. – jeffreyweir Oct 31 '16 at 22:14
  • @jeffreyweir, could you provide sample data in order to see what is causing the problem? Also try the update it should be more performant than the initial measure expression. – alejandro zuleta Oct 31 '16 at 22:22
  • I'm using the sample data in my initial screenshot, and instead of Cumulative I just get a duplicate of the Score column. Weird, because I see it works just fine for you in PowerBI. – jeffreyweir Oct 31 '16 at 22:44
  • Possible date bug? I've had weird stuff before with Pivots and VBA due to date incompatibilities between VBA and Excel given I'm using non-US dates. – jeffreyweir Oct 31 '16 at 22:48
  • @jeffreyweir I am not sure about that. I will try to replicate the issue this night in Excel 2013. By the way, are you using that expression in a calculated column or a measure? – alejandro zuleta Oct 31 '16 at 22:51
  • First expression as calc column, second expression as measure. Is this correct? – jeffreyweir Oct 31 '16 at 22:52
  • I'll add a screenshot showing everything incl output to my original question. – jeffreyweir Oct 31 '16 at 22:55
  • @jeffreyweir, I've recreated the issue. For my surprise I get the same result, cumulative score measure is not behaving as expected. It seems a Power BI table is a tabular visualization while a PowerPivot Pivot table is a matrix-matched visualization and measures behave different in each visualization. I'll try to fix the measure to work in matrix-matched visualizations. – alejandro zuleta Nov 01 '16 at 13:28
  • @jeffreyweir, I've updated the answer give it a try. – alejandro zuleta Nov 01 '16 at 15:21
  • Woohoo! It works! It takes around 28 minutes to calculate on my PC, but it calculates just fine. I'm getting a faster machine and more RAM soon, so that can only get better. Thanks Alejandro...you're a DAX star. – jeffreyweir Nov 02 '16 at 02:53