I am working in Cloud Dataprep and i have a case like this:
Basically I need to create new rows in column 2 based on how many rows there is with matching data in column 1.
Is it possible and how?
I am working in Cloud Dataprep and i have a case like this:
Basically I need to create new rows in column 2 based on how many rows there is with matching data in column 1.
Is it possible and how?
I understand that the scenario you want to have is: obtain all values from column1
that match a value present in column2
. There are many things to consider in this scenario, which you did not describe, such as: can values in column2
be repeated? or if there is a value in column2
missing in column1
, what should happen? or what happens the other way around?
However, as a general approach to this issue, I would do the following flow:
With a flow such as this one, you take the input table, which as two columns like this:
In recipes FIRST_COLUMN
and SECOND_COLUMN
you split both columns into different branches, and do the necessary steps to clean each column. In column1
, I understand nothing is needed to be done. In column2
, I understand that you will have to remove duplicates (again, this is my guessing, but it would depend on your specific implementation, which you have not completely described) and delete empty values. You can do that applying the following transforms:
Finally, you can join both columns together. Depending on your needs (only values present in both columns should appear, only values present in columnX
should appear, etc.) you should apply a different JOIN strategy. You should use a Join key like column1 = column2
(as in the image), and if you choose only the second column in the left-side menu, you will have a single-column result.
Note that in this case I used an Inner-join, but using other JOIN types will provide completely different results. Use the one that fits your requirements better.