I have a workbook that imports data via Power Query from 2 large files (QueryA = 11,000 rows and QueryB = 130,000 rows).
When I join the QueryA[ID] with QueryB[ID] I get back around 11,000 rows relatively quickly. Unfortunately I found duplicate values in QueryB for [ID] which creates duplicate rows in QueryA so I need to filter QueryB based on the max [Date] such that only 1 row of data (the row for the Max Date for each given ID) in QueryB when the join to QueryA occurs.
I inserted the following statement in QueryB.
FilterOutDups = Table.Select(PriorStep, each ([Date] = List.Max(let CurID = [ID] in Table.SelectRows(PriorStep, each [ID] = CurID)[Date])))
This step gives me back the results as expected (1 row/ID with the values related to the Max Date for that ID).
However, when I join the result of QueryB (with the above statement included) to QueryA the result never finishes in an hour and the connections window is showing 125GB and counting when I kill Excel to stop it.
I'd greatly appreciate anyone who can identify why this one statement adds soooo much time to QueryA and how I can alter the code to get the results needed.
Thanks.