0

DAX calculated column for INDEX based on two columns: Date and Item.

Hi Comunity!

I'd like to create the two calculated columns in DAX at the right of the following data:

Date Well Oil Water Gas RunningActMonth RunningMonth
01/01/1990 PP-0001 0 0 0 0 0
01/02/1990 PP-0001 0 0 0 0 0
01/03/1990 PP-0001 0 0 0 0 0
01/04/1990 PP-0001 0 0 0 0 0
01/05/1990 PP-0001 0 0 0 0 0
01/06/1990 PP-0001 0 0 0 0 0
01/07/1990 PP-0001 0 0 0 0 0
01/08/1990 PP-0001 0 0 0 0 0
01/09/1990 PP-0001 0 0.2 0 1 1
01/10/1990 PP-0001 0 5 0 2 2
01/11/1990 PP-0001 1 2 50 3 3
01/12/1990 PP-0001 1.5 4 75 4 4
01/01/1991 PP-0001 3 6 150 5 5
01/02/1991 PP-0001 4.5 18 225 6 6
01/03/1991 PP-0001 6 2 300 7 7
01/04/1991 PP-0001 5 1 250 8 8
01/05/1991 PP-0001 2 3 100 9 9
01/06/1991 PP-0001 0 0 0 9 10
01/07/1991 PP-0001 0 0 0 9 11
01/08/1991 PP-0001 0 0 0 9 12
01/09/1991 PP-0001 0 0 0 9 13
01/10/1991 PP-0001 8 2 400 10 14
01/11/1991 PP-0001 4 3 200 11 15
01/12/1991 PP-0001 3 5 150 12 16

Explanation: I have several Items called "Wells". For each one of them, I have monthly data (oil, water, and gas production). For each well I'd like to find the first production data > 0, and from then start to count an Index column.

  • The first column "RunningActMonth" would count every time one of the production columns is greater than zero, but it will repeat the earlier index if the production = 0.
  • The second column "RunningMonth" would keep indexing forever after the First production data.

Your help will be much appreciated. Regards from Argentina.

Luke_0
  • 779
  • 5
  • 20
  • Hey Marcelo. I'm not super familiar with DAX but in SQL you could use a PARTITION statement, and from a quick search I found something that you may find useful in DAX: https://stackoverflow.com/questions/59622685/dax-expression-for-row-number-partition-by-order-by-equivalent#59624441 – TheHiggsBroson May 31 '23 at 16:47

1 Answers1

2

Create a column to output whether there is production.

Production = table[water] > 0 || table[oil] > 0 || table[gas] > 0

We can reference that in our two columns.

RunningActMonth = 
Calculate( 
    distinctcount( table[date] ), 
    Filter(
        allexcept( table, table[well] ), 
        Table[date] <= earlier( Table[date] ) && [production]
    )
)

This one counts the number of distinct dates with any production at the selected well on or prior you the current date.

RunningMonth = 
Max( 0, 
    Datediff(
        Calculate( 
            Firstnonblank( table[date], if( [production], true(), blank ()) ),
            allexcept( table, table[well] )
        ),
        Table[date],
        Month
    ) + 1
)

This computes the first date with a production value greater than zero. It then takes the non-negative month difference between the current month and that month.

---EDIT-------------------------

To make this scalable, we can do the same thing with measures:

Production = IF( SUM ( table[water] ) = 0 || SUM ( table[oil] ) = 0 || SUM ( table[gas] ) = 0, TRUE(), BLANK() )

I updated production to evaluate to BLANK() instead of FALSE() for zero-production intersections to speed up evaluation later.

We will need a Calendar (Date) table linked to the production table for this next bit to work.

RunningActMonth = 
CALCULATE( 
    COUNTROWS( 
        FILTER( 
            DATESBETWEEN( 'Calendar'[Date], BLANK(), LASTDATE( 'Calendar'[Date] ) ),
            [Production]
        )
    ),
    ALLEXCEPT( table, table[well] )
)

RunningMonth = 
Max( 0, 
    DATEDIFF(
        CALCULATE( 
            FIRSTNONBLANK( 'Calendar'[date], [Production] ),
            ALLEXCEPT( table, table[well] )
        ),
        LASTDATE( 'Calendar'[Date] ),
        Month
    ) + 1
)

The only major changes here are referencing the Calendar table, which will speed up evaluation. Essentially, this will iterate over wells & dates. Your model speed will be determined by the number of wells X the number of dates in your Calendar.

Luke_0
  • 779
  • 5
  • 20
  • This works great! Thank you! But now have the problem that my Data has 22 million records, and despite the calculated columns worked well when creating them, it seems that they are too memory demanding when updating the whole dataset. It gives a Memory error and can not update. Can you think of any way to produce this same result but using less memory? Maybe with Measures? – Marcelo Saez Jun 01 '23 at 14:29
  • I love where you are thinking, measures are by far more efficient, just wanted to be cognizant of your original requirement for a column. Let me create an edit real quick with measures (should be pretty similar). – Luke_0 Jun 01 '23 at 14:34
  • @MarceloSaez, did this solve your question? If so, don't forget to mark it as accepted! Thank you! – Luke_0 Jun 04 '23 at 01:25