Combine multiple rows into single row in Google Data Prep

Question

I have a table which has multiple payload values in separate rows. I want to combine those rows into a single row to have all the data together. Table looks something like this.

+------------+--------------+------+----+----+----+----+
|    Date    |     Time     | User | D1 | D2 | D3 | D4 |
+------------+--------------+------+----+----+----+----+
| 2020-04-15 | 05:39:45 UTC | A    |  2 |    |    |    |
| 2020-04-15 | 05:39:45 UTC | A    |    |  5 |    |    |
| 2020-04-15 | 05:39:45 UTC | A    |    |    |  8 |    |
| 2020-04-15 | 05:39:45 UTC | A    |    |    |    |  7 |
+------------+--------------+------+----+----+----+----+

And I want to convert it to something like this.

+------------+--------------+------+----+----+----+----+
|    Date    |     Time     | User | D1 | D2 | D3 | D4 |
+------------+--------------+------+----+----+----+----+
| 2020-04-15 | 05:39:45 UTC | A    |  2 |  5 |  8 |  7 |
+------------+--------------+------+----+----+----+----+

I tried "set" and "aggregate" but they didn't work as I wanted them to and I am not sure how to go forward.

Any help would be appreciated. Thanks.

score 2 · Accepted Answer · answered Apr 15 '20 at 16:23

tl;dr:

use fill() function to fill all empty values within each d1-d4 columns in the wanted group (AKA - the columns date+time+user) then dedup\aggregate to your heart's content.

long version

So the quickest way to do this is by using a window-function called "fill()". What this function does for each given field in a column, it tells it: "Look down. look up. find the closest non-empty value, and copy it!" you can ofcourse limit it's sight (look only 3 rows above, for example) but for this example, don't need the limitation. so your fill function will look like this: FILL($col, -1, -1) So the "$col" will reference all the chosen columns. the "-1" says "unlimited sight". finally, the "~" says "from column D1 to column D4".

So, function will look like this:

.

Which in turn will make your columns look like this: .

Now you can use the "dedup" transformation to remove any duplications, and only 1 copy of each "group" will remain. Alternatively, if you still want to use "group by", you can do that aswell.

Hope this helps =]

p.s There are more ways to do this - which entails using the "pivot" transformation, and array unnesting. But in the process you'll lose your columns' names, and will need to rename them.

Hey Amit, It seems I can't get my recipe to run at all. I am not able to load a new sample or run it as a job. Every other recipes seems to work properly. Any idea. — VSR, Apr 16 '20 at 13:21
not that I can think of. Could be many reasons. a schema change at the source, for example. I do recommend you visit https://community.trifacta.com/s/, the community site - and ask there. You'll get much more response there =] — Amit Miller, Apr 17 '20 at 08:20

Combine multiple rows into single row in Google Data Prep

1 Answers1