1

I have a column with values that are duplicated e.g.

VMS5796,VMS5650,VMS5650,CSL,VMA5216,CSL,VMA5113

I'm applying a transform using jython that removes the duplicates (On error is set to keep original), here's the code:

return list(set(value.split(",")))

Which works in the preview, but isn't getting applied to the column. What am I doing wrong?

pnuts
  • 58,317
  • 11
  • 87
  • 139
Phil
  • 3,568
  • 3
  • 34
  • 40

2 Answers2

1

The Map function is very powerful and an underused function in Python / Jython. It probably is unclear what this code does internally, but it is extremely fast in processing millions of bits of values from a list or array in your columns cells' values that need to be 'mapped' as a string type and then applying a join with a separator char such as a comma ', '

deduped_list = list(set(value.split(",")))
return ', '.join(map(str, deduped_list))

There are probably other, even slightly faster variations than this, but this should get you going in the right direction.

Interestingly, you can also get the 'printable representation' repr(object) which is acceptable to an EVAL like OpenRefine's and can be useful for seeing the representation of your values as well..., which I just found out about, researching this answer in more depth for you.

deduped_list = list(set(value.split(",")))
return ', '.join(map(repr, deduped_list)) 
Thad Guidry
  • 579
  • 4
  • 8
  • 1
    `",".join(set(value.split(",")))` should work if `value` is a string. – jfs Mar 14 '13 at 23:19
  • repr wasn't working as I wanted so I used a lambda function to encode the string as ascii which is what I wanted. Your solution worked and introduced me to map(), thanks. – Phil Mar 15 '13 at 01:22
0

Preview implicitly formats things for display. Your expression returns an array (which can't be stored in a cell), so if you'd like to get it string form, tack a .join(',') on the end.

Tom Morris
  • 10,490
  • 32
  • 53