1

Trying to remove duplicate numbers in the same cell in alteryx using regex replace due to applying to a large amount of columns. Below is my current formula. It is working for all rows except the last where the decimals are numbers are being removed, instead of just removing duplicate numbers between commas in the same cell. Any suggestions? I have been playing with the formula in regex101.com and cannot seem to figure it out.

TRIM(REGEX_Replace([CurrentField], '(-?\d+(?:[.,]\d+)?),(?=.*\1)', ''))

ID Description Price CurrentOUTPUT Desired Output
0 red 0,0,67,67,0 0,67 0,67
1 blue -30,-30 -30 -30
2 green 0.03,0.045,0.03 0.03,0.045 0.03,0.045
2 green 1.17,0.37,0.36,0.36,0.37 1.10.36,0.37 1.17,0.37,0.36
arisophia
  • 61
  • 4

1 Answers1

1

You can use

TRIM(REGEX_Replace([_CurrentField_], ',?(-?\d+(?:[.,]\d+)?)(?=.*\1)', ''), ',')

See the regex demo.

You will need to trim commas after regex replacement, hence TRIM is used.

Details:

  • ,? - an optional comma
  • (-?\d+(?:[.,]\d+)?) - Group 1: an optional minus, one or more digits, then an optional sequence of . or , and one or more digits
  • (?=.*\1) - a positive lookahead that matches a location immediately followed with any zero or more chars other than line break chars as many as possible and then Group 1 value.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    Hey Wiktor, good answer as always. If you add `,?` directly after the first capture group can you drop the Trim() function? – Chris Maurer Oct 25 '21 at 01:14
  • Thanks! this almost fully worked, looks like the trim isn't working like it should. Below is what I am seeing, some cells not all commas are removed. I tried cleansing to remove any white spaces and reran but still getting duplicate commas. Initial input: 67,67,67,67,67,67,0,0,0,0,0,0,67,67,67,67,67,67. Output I am getting: 0,,,,67 – arisophia Oct 25 '21 at 01:23
  • 1
    Wiktor is European and uses the comma symbol as a decimal point. If you don't want that behavior, then remove comma from inside the square brackets. – Chris Maurer Oct 25 '21 at 01:27
  • 1
    See my first comment. If you add `,?` after first group, it will fix the problem and you won't need the Trim() function either. – Chris Maurer Oct 25 '21 at 01:34
  • 1
    @ChrisMaurer Thank you for support. Yes, an optional comma pattern at the start or end of the initially suggested pattern fixes the problem with commas inside the string. – Wiktor Stribiżew Oct 25 '21 at 07:00
  • 1
    @arisophia Chris's suggestion is correct, the optional comma pattern removes excess commas in between unique values. – Wiktor Stribiżew Oct 25 '21 at 07:00
  • @WiktorStribiżew Updated the question as I had to tweak the formula due to changing dataset. That regex101 site has been very useful and I have been trying to figure out how to update the formula using that to get the new desired output but no luck. Any suggestions? – arisophia Feb 04 '22 at 16:41
  • @arisophia Try `,?(-?\d+(?:\.\d+)?+)(?=.*(?<!\d)\1\b)`, see [demo](https://regex101.com/r/dviOtE/3). – Wiktor Stribiżew Feb 04 '22 at 20:44