0

I'm a little stumped.

I need to come up with a query to exclude an entry where :

For the same 'ClientN' value, if the Start Time is after a previous start time AND before a previous End Time, then it's excluded (mentioned below as exclude)

But, if a second time extends beyond the first entry, then the extension of the time is counted (mentioned in the table as 'partial include').

Below is some data that has had this rule manually applied.

I was planning on using PowerQuery for this, but I am open to other ideas.

The premise is to remove instances of 'double counts' against the same project/client. (DT is in d/mm/yyyy HH:mm)

Start Time End Time Duration ClientN ProjectN Decision
6/01/2022 12:30 6/01/2022 13:12 0.00:42:00 Client1 Project6 include
6/01/2022 13:14 6/01/2022 13:44 0.00:30:18 Client1 Project6 include
6/01/2022 13:44 6/01/2022 14:18 0.00:33:36 Client2 Project3 include
6/01/2022 14:18 6/01/2022 15:05 0.00:47:24 Client1 Project6 include
6/01/2022 15:12 6/01/2022 15:39 0.00:26:06 Client2 Project3 include
6/01/2022 21:00 6/01/2022 22:00 0.01:00:00 Client2 Project5 include
7/01/2022 8:30 7/01/2022 9:00 0.00:30:00 Client1 Project1 include
7/01/2022 12:00 7/01/2022 13:00 0.01:00:00 Client1 Project7 include
7/01/2022 13:30 7/01/2022 14:30 0.01:00:00 Client1 Project6 include
7/01/2022 17:10 7/01/2022 17:42 0.00:32:12 Client2 Project8 include
10/01/2022 13:45 10/01/2022 14:45 0.01:00:00 Client1 Project6 include
10/01/2022 13:46 10/01/2022 13:59 0.00:12:53 Client1 Project1 exclude
10/01/2022 13:59 10/01/2022 14:00 0.00:01:20 Client2 Project2 include
10/01/2022 14:00 10/01/2022 14:09 0.00:08:18 Client1 Project1 exclude
10/01/2022 14:09 10/01/2022 14:10 0.00:01:20 Client2 Project2 include
10/01/2022 14:09 10/01/2022 14:11 0.00:02:02 Client1 Project1 exclude
10/01/2022 14:11 10/01/2022 14:12 0.00:00:38 Client2 Project2 include
10/01/2022 14:12 10/01/2022 14:31 0.00:19:36 Client1 Project1 exclude
10/01/2022 14:31 10/01/2022 14:32 0.00:00:59 Client2 Project2 include
10/01/2022 14:32 10/01/2022 14:32 0.00:00:02 Client1 Project1 exclude
10/01/2022 14:32 10/01/2022 14:33 0.00:01:12 Client2 Project2 include
10/01/2022 14:33 10/01/2022 14:34 0.00:00:59 Client1 Project1 exclude
10/01/2022 14:34 10/01/2022 14:39 0.00:04:20 Client2 Project2 include
10/01/2022 14:39 10/01/2022 14:42 0.00:03:14 Client1 Project1 exclude
10/01/2022 14:43 10/01/2022 14:55 0.00:12:14 Client1 Project1 partial include

I was playing around with formulas:

Column =
VAR __Current = [end time]
VAR __PreviousDate = MAXX(FILTER('detailedreportextracts__small',[start Time] < EARLIER([start Time] ) && [ClientN] = earlier([ClientN]) ),[start Time])
var __Previous = MAXX(filter(detailedreportextracts__small,[start Time]= __PreviousDate),[end Time])
return
__Current - __Previous

The issue appears to be that it only looks back one data row.

So, where I see a negative value on the first exclusion (this is correct), I would also think I would see negative values below this line on the lines that I have marked as 'exclude'.

enter image description here

I might have the columns in the wrong spot in the formula. I did swap them around, but then the row calculated value didn't work for me.

After fluffing around for another few hours with tireless googling, I've been trying to play with: https://community.powerbi.com/t5/Community-Blog/Mean-Time-Between-Failure-MTBF-and-Power-BI/ba-p/339586

Pulling apart the DAX queries as they stand, so I can understand them better. I'm not sure if it's the best way to do it, but I'm trying to extract the 2:45pm (circled) into another column, to then check if the [end time] is < the earlier larger end time. But, for whatever reason, the 'earlier' function doesn't seem to look back very many rows earlier.

enter image description here

In my latest attempt: How to Compare the Current Row to the Previous Row by Category Using DAX Again, my results seem to concur that I'm not able to look back further rows to maintain that 'max' value found earlier.

Any help would be greatly appreciated.

Ken White
  • 123,280
  • 14
  • 225
  • 444
cptcherry
  • 43
  • 1
  • 7
  • Your explanation and examples are not clear to me. You write *"remove instances of 'double counts' against the same project/client"* but your example seems as if you are only removing instances against the same client. – Ron Rosenfeld Mar 15 '22 at 18:50

1 Answers1

0

See if this works for you. Assumes Table1 includes everything EXCEPT the decision column. It will basically generate a table of non-overlapping time spent by project/client

Based on aggregate continuous "transitive" overlapping time intervals

function process

(xtable)=>
// for each group, compare each list against all lists in column Custom, and merge those that overlap
let Source= Table.Buffer(xtable),
#"Added Custom"= Table.AddColumn( 
 Source,
"Custom2",
    each let
    begin = [Custom],
    myclient=[ClientN],
    myproject=[ProjectN]

in 
List.Accumulate (
    Table.SelectRows(Source,each [ClientN]=myclient and [ProjectN]=myproject)[Custom],
     begin,
    (state,current)=> if List.ContainsAny(state,current) then List.Distinct(List.Combine({current,state})) else state
)
),
// count the number of changes made from original version. If this is not zero, we will recurse the changes
x= List.Sum(List.Transform(List.Positions(#"Added Custom"[Custom]), each if #"Added Custom"[Custom]{_} = #"Added Custom"[Custom2]{_} then 0 else 1)),
RemovePrioCustom= Table.RemoveColumns(#"Added Custom",{"Custom"}),
AddNewCustom= Table.RenameColumns(RemovePrioCustom,{{"Custom2", "Custom"}}),
recursive = if x=0 then AddNewCustom else @process( AddNewCustom)
in recursive

code

let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Start Time", type datetime}, {"End Time", type datetime}, {"Duration", type duration}, {"ClientN", type text}, {"ProjectN", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", 
 (i)=>Table.SelectRows(#"Added Index", each [ClientN]=i[ClientN] and [ProjectN]=i[ProjectN] and
([Start Time]>=i[Start Time] and [End Time]<=i[End Time] or
[Start Time]<=i[Start Time] and [End Time]>=i[End Time] or
[Start Time]<=i[Start Time] and [End Time]<=i[End Time] and [End Time]>=i[Start Time]or
[Start Time]>=i[Start Time] and [End Time]>=i[End Time] and [Start Time] <=i[End Time])
 )[Index]
 ),
MergeOverlap=  process(#"Added Custom"),
#"Added Custom1" = Table.AddColumn(MergeOverlap, "StartMin", each List.Min(List.Transform([Custom], each MergeOverlap[Start Time]{_})),type datetime),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "EndMax", each List.Max(List.Transform([Custom], each #"Added Custom1"[End Time]{_})), type datetime),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom2",{"Start Time", "End Time" ,"Duration","Index", "Custom"}),
#"Removed Duplicates" = Table.Distinct(#"Removed Columns", {"ClientN", "ProjectN","StartMin", "EndMax"}),
  #"Sorted Rows" = Table.Sort(#"Removed Duplicates",{{"ClientN", Order.Ascending}, {"ProjectN", Order.Ascending}, {"StartMin", Order.Ascending}})
in #"Sorted Rows"
horseyride
  • 17,007
  • 2
  • 11
  • 22
  • Oh wow, this, and the other link you included, in theory, looks exactly on the money! I'll test it asap, but thank you so much. Aside from column pivoting and having a whole number of columns to determine if the start/end dates were changing etc for each client etc and making a big mess in excel, I didn't really have any idea how this could be implemented. – cptcherry Mar 16 '22 at 12:18
  • I modified my link to produce code above that resembles what I think you want. Let me know how it goes – horseyride Mar 16 '22 at 12:22