Timeseries with overlapping timeframes, using just the most recent in Excel Power Query

Question

i have the following problem:

In Power Query i can bring together multiple csv files with power query, with the function to load from a folder. Unfortenatuly, the timeseries data we get is overlapping, with changed values for the entire timeseries. So for example we get a file for the last 30 days every monday. The values in the last 30 days do change, so the values from my report last week might not be up to date anymore. My goal is to overwrite the old data from last week with the new data from this week.

Example:

File A

Date        Item    Hits   Solved
01.01.2018  A       100    50
01.01.2018  B       138    65
02.01.2018  A       124    70
...
07.01.2018  A       143    50
07.01.2018  B       147    57
...
30.01.2018  A       250    124

File B

Date        Item    Hits   Solved
07.07.2018  A       143    80
07.07.2018  B       147    95
...
06.02.2018  A       341    148

The data from file A should be updated with the new information from File B, keeping only the values from (in this case) the 01.01.2018 to the 06.01.2018 from file A. The rest should be from file B.

Until now i tried to give my files numbers (the newer the higher, so file A -> 1, file B -> 2), making a new (key) column with Date+Item, sorting (descendant) by file number and then removing duplicates in the key column. This sometimes works (only keeping the most recent entries), but sometimes it does not. Removing duplicates from the Date Column does not work for me. Here it just remove the newest entries, keeping the old.

Is there a better way to solve this problem?

Thanks alot!

score 1 · Accepted Answer · answered Aug 10 '18 at 15:43

1

Go the step where you are sorting the data based on your file order and put it inside Table.Buffer(), so it would look like

= Table.Buffer(Table.Sort(BLAH BLAH TABLE SORTING CODE))

I don't remember the technical reasons, but for purposes here this resets the "internal" sort order PQ uses when running remove duplicates to follow the visual sort order of the table.

Also it's not necessary to create a key column, you can just select your Item + Date Columns and remove duplicates and it will accomplish the same thing.

answered Aug 10 '18 at 15:43

Wedge

1,766
1
8
14

This works great. Thank you! Do i understand it correctly that PQ internally does not change/recalculate the order if you do not use Table.Buffer()? – Arthur Pennt Aug 13 '18 at 09:06
1

I believe when you use remove duplicates the function removes them based on the original order the data was loaded in. Table.Buffer() reloads data from whatever you referenced, so the "load" order remove duplicates uses is reset as well. I'm assuming there is an internal key for data you don't see that PQ is using on the backend. – Wedge Aug 13 '18 at 16:14

score 0 · Answer 2 · answered Aug 10 '18 at 17:00

0

I would suggest starting with FileA and doing a left anti merge with FileB on the Date and Item columns. This will remove any rows from FileA that have matching Date and Item rows in FileB.

Then delete the new column that this merge created and append File B to the end and you should have the result you want.

answered Aug 10 '18 at 17:00

Alexis Olson

38,724
7
42
64

Thank you for your comment. This seems to work, but from my limited knowledge you can not merge files from a folder, can you? I thought you can just append files!? – Arthur Pennt Aug 13 '18 at 09:08
1

There's no group merge, as far as I know, but you should still be able to use this approach by doing two folder loads. One where you filter out the most recent file (Query A) and one where you just pick the most recent file (Query B) and then treat Query A and Query B like `FileA` and `FileB` in my answer. – Alexis Olson Aug 13 '18 at 13:49

Timeseries with overlapping timeframes, using just the most recent in Excel Power Query

2 Answers2

Linked