Regex to avoid data duplication in delimited string?

Question

I am trying to validate the data which will be string value with the , delimited. What I want is to validate that there should not be repetition of the same value within the sting.

Ex. my value would be.

    data1 = "value1,value2,value3,va-lu4,value 6,value1";//should fail
    data2 = "value1,value2,value3,va-lu4,value 6";//should pass

In above scenario data1 should fail as it contains the value1 twice. And in data2 should pass or match as it doesnot contain any repeated value.

This is what I got for matching the each value but not sure how to check for the repetition.

    ^[-\w\s]+(?:,[-\w\s]*)*$

This will matches the values between delimiter but not sure how to check if duplicate values exist. Any help would be great.
Note- I know I can do this using the sting functions and loop bu I was learning the Regex and want to try if it is possible using the regex.In case of confusion feel free to comment AS this is my first question on Stack.

What **language** are you using? Because if the expression isn't compiled correctly it could lead to alot of catastrophic backtracking. — hwnd, Feb 27 '15 at 06:37
I am currently trying with `Javascript` but eventually I also want to try with the strct typed as `c#/Java` languages. — user2745246, Feb 27 '15 at 07:57

vks · Accepted Answer · 2015-02-27T06:58:31.837

1

^(?!(?:^|.*,)([^,\n]*),.*\1(?:,|$)).*$

Try this.See demo.

https://regex101.com/r/wU7sQ0/24

edited Feb 27 '15 at 06:58

answered Feb 27 '15 at 06:33

vks

67,027
10
91
124

What if it was `value1,value2,value2,va-lu4,value 6` – hwnd Feb 27 '15 at 06:36

Matt · Answer 2 · 2015-02-27T08:27:44.200

1

Regular expressions are useful in many cases. But to check duplicates in a string can be achieved easier like this (in C#):

bool HasDuplicates(string str)
{
    var list1=str.Split(',').Select(s=>s.Trim());
    var list2=list1.Distinct();
    return (list1.Count()>list2.Count());
}

How it works: The function converts the string into a list, trims the elements and then creates a second distinct list from it. Finally it compares the number of elements in both lists: If the distinct list has less elements than the original list you have duplicates and the function returns true, else false.

Example:

var result1=HasDuplicates("Test1, Test1, Test2");
var result2=HasDuplicates("Test1, Test2, Test3");

The variable result1 contains true, variable result2 contains false. You can try out the code in DotNetFiddle: https://dotnetfiddle.net/0pRURH

edited Feb 27 '15 at 08:27

answered Feb 27 '15 at 08:00

Matt

25,467
18
120
187

Yes I know. With string functions it is easy and it is good practice to use the string functions when complex string manipulations. But I was just trying to learn and when trying I am stumbled to this problem. Still Thanks. – user2745246 Feb 27 '15 at 08:33
That's fine, and there are a lot of cases where RegEx has advantages, which are usually when you're looking for complex patterns - such as the IP finder `((2[0-4]\d|25[0-5]|[01]?\d\d?)\.){3}(2[0-4]\d|25[0-5]|[01]?\d\d?)`. By the way, there is a good 30 minute tutorial [here](http://www.codeproject.com/Articles/9099/The-Minute-Regex-Tutorial), I hope it is useful for you. – Matt Feb 27 '15 at 11:50
Thanks for tutorial. I will definitely go through it. Thanks again. – user2745246 Feb 27 '15 at 13:47

Bohemian · Answer 3 · 2015-02-27T15:17:04.423

1

This works:

^(?!.*(^|,)([^,]+),.*\2(,|$)).*

See demo

edited Feb 27 '15 at 15:17

answered Feb 27 '15 at 14:03

Bohemian

412,405
93
575
722

Thanks for this. Can you explain which approch is better this or @vks ? And why ? That would help me to understand regex better. Thanks. – user2745246 Feb 28 '15 at 13:30
the main difference is that vks will treat a blank value as matching another blank, eg `abc,,def,,ghi` is treated as having a duplicate, due to the capture group having a quantifier of `*` (*zero* or more). Vks also uses non-capturing groups `(?:...)`, which offer trivial performance benefits but IMHO make the regex harder to read .Vks also won't treat the newline as part of the value, but I don't think that's very relevant. Other than those points, the two regex's are very similar. – Bohemian Feb 28 '15 at 13:49

Regex to avoid data duplication in delimited string?

3 Answers3