1

Looking for a followup to Max Zelensky's solution here. Assuming the original example had a [Date] field, I'm trying to go one more level and add a column that shows the prior [Date] value

I've also looked here and here and here.

  1. Per Max, I've created the table of tables:

    • AddedCustom = Table.AddColumn(GroupedRows, "Custom", each Table.AddIndexColumn([tmp],"Occurrence", 1,1) type table)
  2. Created a second index:

    • SecondIndex= Table.AddColumn(AddedCustom, "Custom.2", each Table.AddIndexColumn([Custom],"Occurance.2", 0,1), type table)
  3. I've successfully added a column that references the current [Date] row:

    • CurrentDate= Table.AddColumn(SecondIndex, "Date.2", each Table.AddColumn([Custom.2],"Date.2", each [Date]), type table)
  4. But when I try to reference either index column (even just putting in {0}), the new field errors out. I'm fairly certain I'm missing something in the syntax of referencing rows within a table within a column of tables, but I'm just not sure how to get there -- A few examples I've tried without success:

    • PriorDate= Table.AddColumn(SecondIndex, "PriorDate", each Table.AddColumn([Custom.2],"Prior Date", each {0}[Date]), type table) -- just to see if I could return the value from the first row

    • PriorDate= Table.AddColumn(SecondIndex, "PriorDate", each Table.AddColumn([Custom.2],"Prior Date", each {[Occurance.2]}[Date]), type table) --doesn't work for [Occurance] or [Occurance.2]

    • PriorDate= Table.AddColumn(SecondIndex, "PriorDate", each Table.AddColumn([Custom.2],"Prior Date", each {[Occurance]-1}[Date]), type table)

    • PriorDate= Table.AddColumn(SecondIndex, "PriorDate", each Table.AddColumn([Custom.2],"Prior Date", each [Custom.2]{0}[Date]), type table)

    • PriorDate= Table.AddColumn(SecondIndex, "PriorDate", each Table.AddColumn([Custom.2],"Prior Date", each Table.SelectColums([Custom.2],[Date])), type table)

Also, can anyone point me to a good reference of the syntax and mechanics for optimizing #Tables, {Lists}, [Records] etc. I would appreciate it (I've read chapter 20 of Ken Puls' book a few times, but it hasn't quite stuck yet). Thanks in advance!

| Name | Date     | Occurance | Prior Date (Desired) |
|------|----------|-----------|----------------------|
| A    | 1/1/2019 | 1         | null/error           |
| A    | 3/1/2019 | 2         | 1/1/2019             |
| B    | 2/1/2019 | 1         | null/error           |
| A    | 4/1/2019 | 3         | 3/1/2019             |
| B    | 5/1/2019 | 2         | 2/1/2019             |
Alexis Olson
  • 38,724
  • 7
  • 42
  • 64
stevewing
  • 13
  • 4

1 Answers1

0

Similar to my answer here, instead of adding just one index, you can add two, one starting from 0 and one starting from 1, which we use to calculate the previous row by performing a self merge.

let
    Source = Table.FromRows({{"A",#date(2019,1,1)},{"A",#date(2019,1,3)},{"B",#date(2019,1,2)},{"A",#date(2019,1,4)},{"B",#date(2019,1,5)}}, {"Name", "Date"}),
    ChangeTypes = Table.TransformColumnTypes(Source,{{"Name", type text}, {"Date", type date}}),
    GroupByName = Table.Group(ChangeTypes, {"Name"}, {{"tmp", each _, type table}}),
    AddIndices = Table.AddColumn(GroupByName, "Custom", each Table.AddIndexColumn(Table.AddIndexColumn([tmp],"Occurrence", 1,1),"Prev",0,1)),
    ExpandTables = Table.ExpandTableColumn(AddIndices, "Custom", {"Date", "Occurrence", "Prev"}, {"Date", "Occurrence", "Prev"}),
    SelfMerge = Table.NestedJoin(ExpandTables,{"Name", "Prev"},ExpandTables,{"Name", "Occurrence"},"Expanded Custom",JoinKind.LeftOuter),
    ExpandPriorDate = Table.ExpandTableColumn(SelfMerge, "Expanded Custom", {"Date"}, {"Prior Date"}),
    RemoveExtraColumns = Table.RemoveColumns(ExpandPriorDate,{"Prev", "tmp"})
in
    RemoveExtraColumns
Alexis Olson
  • 38,724
  • 7
  • 42
  • 64
  • This definitely works, The reason I was tinkering with adding the columns within the table is I was trying to avoid a Left Join, which doubles the # of rows pulled. I have a few other merges that are aggregating "Total Occurrence" (Filter Occurrence 1 of 3, 2 of 3, etc) , "Date of First Occurrence, etc" that have already taken a ~200K row sample query to 1M+ rows. Not a major issue now, but I'm worried about how much exponentially larger (and slower) this may get when I apply this to all 40M rows in my data. – stevewing Feb 21 '19 at 21:47
  • I'm pretty sure the self-merge method is going to be more efficient than any kind of index shifting or lookup methods since merges are usually highly optimized operations in most query languages. Also, what I've suggested does not involve creating any extra rows. – Alexis Olson Feb 21 '19 at 22:17
  • I've been tinkering with the self-merge, and my baseline load time before adding this column is ~40 seconds for 210K rows. Even though the final output is still only 210K rows, the merge doubles the rows referenced and doubles the load time -- it essentially has to look at the data twice to merge it. I have another merge that does something similar, which quadruples my load times and number of rows referenced in my data (output is still just the original 210K rows). I haven't run this solution against my full data set yet, but I'm concerned I'll be dealing with load times of hours. – stevewing Feb 22 '19 at 18:13
  • I can't guarantee you that this will perform as well as you like (that might not be possible), but I'm pretty confident the self-merge will have significantly better performance than doing index lookups. The latter tends not to be optimized well by the query engine. – Alexis Olson Feb 22 '19 at 18:24
  • Fair -- I wonder if this would be better handled with a DAX calculated column in the data model instead. It just seems that the merge in Power Query causes too many performance issues with an otherwise fast load. Thanks for the help! – stevewing Feb 22 '19 at 18:38