3

I'm cleaning data with OpenRefine (was Google Refine) from multiple sources. I have files from different sources which contain companies, column definitions are identical i.e.

UNID  | Name      | Street    | City    | Country   | Phone | ...
sg52d | Company a | A street  | a city  | c country | 12345
sg52d | Company a | A street  | a city  | c country | 0099835
dfnsd | Company B | B Street  | City B  | c country | 33445
dfnsd | Company B | Different | Another | c country | 33445
xxbb3 | Company C | C Street  | City B  | Country A | 1111
xxbb3 | Company C | C Street  | City B  | Country A | 1111

What I want is this result (only the last Company is merged, all columns were identical)

UNID  | Name      | Street    | City    | Country   | Phone | ...
sg52d | Company a | A street  | a city  | c country | 12345
sg52d | Company a | A street  | a city  | c country | 0099835
dfnsd | Company B | B Street  | City B  | c country | 33445
dfnsd | Company B | Different | Another | c country | 33445
xxbb3 | Company C | C Street  | City B  | Country A | 1111

Is there a simple way to do this?

I understand that I can concatenate all columns into a new column, but this is a little PITA, because of the number of columns.

Perhaps there is a way for the new column definition to loop through all other columns and merge it?

pnuts
  • 58,317
  • 11
  • 87
  • 139
Christian Waidner
  • 1,324
  • 1
  • 13
  • 22
  • Similar question: https://stackoverflow.com/questions/58677751/how-to-merge-rows-in-openrefine – Vanuan Nov 03 '19 at 16:28

2 Answers2

6

It is a strange approach but this should work: http://googlerefine.blogspot.com/2011/08/remove-duplicate.html Make sure you make the sort change permanent.

Sonicthoughts
  • 548
  • 1
  • 4
  • 16
1

You could create new column with an expression like:

forEach(["UNID", "Name", "Street", "City", "..." ],x,cells[x].value).join("")
Chris Schoon
  • 2,145
  • 1
  • 17
  • 11
  • Thank you for this solution. That's also the best I came up with but it means that I have to list every single column and this is quite a lot to type if I have a big worksheet. I hoped for someting like forEach(startcolumn, endcolumn).join() but no luck. – Christian Waidner Mar 06 '14 at 15:11