Cloud Dataprep - Multiply rows in one column based on values in other column

Question

I am working in Cloud Dataprep and i have a case like this:

Basically I need to create new rows in column 2 based on how many rows there is with matching data in column 1.

Is it possible and how?

score 1 · Answer 1 · answered Jun 14 '18 at 14:32

I understand that the scenario you want to have is: obtain all values from column1 that match a value present in column2. There are many things to consider in this scenario, which you did not describe, such as: can values in column2 be repeated? or if there is a value in column2 missing in column1, what should happen? or what happens the other way around?

However, as a general approach to this issue, I would do the following flow:

With a flow such as this one, you take the input table, which as two columns like this:

In recipes FIRST_COLUMN and SECOND_COLUMN you split both columns into different branches, and do the necessary steps to clean each column. In column1, I understand nothing is needed to be done. In column2, I understand that you will have to remove duplicates (again, this is my guessing, but it would depend on your specific implementation, which you have not completely described) and delete empty values. You can do that applying the following transforms:

Finally, you can join both columns together. Depending on your needs (only values present in both columns should appear, only values present in columnX should appear, etc.) you should apply a different JOIN strategy. You should use a Join key like column1 = column2 (as in the image), and if you choose only the second column in the left-side menu, you will have a single-column result.

Note that in this case I used an Inner-join, but using other JOIN types will provide completely different results. Use the one that fits your requirements better.

Just as an additional note, if you want to preserve both columns, feel free to select both in the last step. Additionally, in case you actually want to *multiply* the values, as you mentioned in the title of the question, you should not remove duplicates. This flow is just the base for your scenario, but you should be able to adapt it to your specific needs, playing around with the available options. Hope it helps! — dsesto, Jun 14 '18 at 14:34

Cloud Dataprep - Multiply rows in one column based on values in other column

1 Answers1