I had a similar requirement for de-duplicating address strings. So I created a new column (say COMPLETE_ADDRESS) and concatenated the STREET, CITY, PROVINCE, COUNTRY and ZIPCODE fields using the below GREL expression
cells["STREET"].value + " " + cells["CITY"].value + " " + cells["PROVINCE"].value + " " + cells["COUNTRY"].value + " " + cells["ZIPCODE"].value
Then I did the following :
- Clustered the new COMPLETE_ADDRESS column with the default algorithm
- Merged the values in each cluster (now the values are perfect duplicates)
- Sort the column permanently.
- Do a "blank down" operation.
- Finally pick only non-null values in the COMPLETE_ADDRESS
Having said that, as of this writing, there is no feature to merge the independent columns. The only way to do that it is to split the COMPLETE_ADDRESS into separate columns suitably. In this case, you will have to use a better separator such as pipe "|" symbol which will not conflict with existing values.