-3

We have filenames that contain product numbers at the start and based on this we apply processing when adding them to the system

i need a regex that should match the following

70707_70708_70709_display1.jpg
70707_Front010.jpg

and NOT these

626-this files is tagged.jpg
1000x1000_webbanner2.jpg
2000 years ago_files.jpg
626gamingassets_styleguide.jpg
70707_Front010_0001_1.jpg

i have a regex that almost does what i want except for one case highlighted below

\d{3,}(?=_)



70707_70708_70709_display1.jpg - success 3 matches {70707,70708,70709}
70707_Front010.jpg -             success 1 match {70707 }
626-this files is tagged.jpg -   success 0 matches
1000x1000_webbanner2.jpg -       fail  1 match {1000}
2000 years ago_files.jpg -       success 0 matches
626gamingassets_styleguide.jpg - success 0 matches
70707_Front010_0001_1.jpg      - fail 2 matches{70707,0001}

I have a regex test to illustrate this at regex101.

The regex should only look for sets of underscore separated numbers at the beginning.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Pankaj Kumar
  • 1,748
  • 6
  • 28
  • 41
  • 1
    Try `(?<=_|\b)\d{3,}(?=_)` or `^\d{3,}(?=_)` (if you mean the digits should only be matched at the start of the string). Or `(?<=_|^)\d{3,}(?=_)` – Wiktor Stribiżew Nov 22 '17 at 07:44
  • Please clarify the question since your comment below CodeFuller's answer does not agree with *The regex should only look for sets of numbers at the beginning* requirement. – Wiktor Stribiżew Nov 22 '17 at 07:51
  • @WiktorStribiżew : (?<=_|\b)\d{3,}(?=_) this one works . Thanks. Could you please put it in an answer so that i can accept it. also could you please explain it? – Pankaj Kumar Nov 22 '17 at 07:53
  • Let me check if it is not a dupe of another question, and if not, I will reopen. I am really reluctant to post any more dupes. – Wiktor Stribiżew Nov 22 '17 at 07:55
  • 1
    Also, please modify the question to reflect the actual requirement. BTW, what about `70707_Front010_0001_1.jpg` - should it be matched? Do you want to extract `0001`? This is still unclear. – Wiktor Stribiżew Nov 22 '17 at 07:56
  • @WiktorStribiżew : in 70707_Front010_0001_1.jpg , only the product number at the start should be extracted. – Pankaj Kumar Nov 22 '17 at 08:04

2 Answers2

1

You may try a non-regex solution:

var results = s.Split('_').TakeWhile(x => x.All(char.IsDigit) && x.Length >= 3).ToList();
if (results.Count > 0)
    Console.WriteLine("Results: {0}", string.Join(", ", results));
else
    Console.WriteLine("No match: '{0}'", s);

See the C# demo. Here, the string is split with _ and then only the first items that are all digits and of length 3+ are returned.

You may use the following regex based solution:

^(?<v>\d{3,})(?:_(?<v>\d{3,}))*_

See the regex demo

Pattern details

  • ^ - start of a string
  • (?<v>\d{3,}) - Group v: 3 or more digits
  • (?:_(?<v>\d{3,}))* - 0+ occurrences of
    • _ - an underscore
    • (?<v>\d{3,}) - Group v: 3 or more digits
  • _ - a _.

C# demo:

var lst = new List<string> {"70707_70708_70709_display1.jpg",
        "70707_Front010.jpg",
        "626-this files is tagged.jpg",
        "1000x1000_webbanner2.jpg",
        "2000 years ago_files.jpg",
        "626gamingassets_styleguide.jpg" };
foreach (var s in lst) 
{
        var mcoll = Regex.Matches(s, @"^(?<v>\d{3,})(?:_(?<v>\d{3,}))*_")
            .Cast<Match>()
            .SelectMany(m => m.Groups["v"].Captures.Cast<Capture>().Select(c => c.Value))
            .ToList();
        if (mcoll.Count > 0)
            Console.WriteLine("Results: {0}", string.Join(", ", mcoll));
        else
            Console.WriteLine("No match: '{0}'", s);
 }

Output:

Results: 70707, 70708, 70709
Results: 70707
No match: '626-this files is tagged.jpg'
No match: '1000x1000_webbanner2.jpg'
No match: '2000 years ago_files.jpg'
No match: '626gamingassets_styleguide.jpg'
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

If number is always at the line beginning, this will work:

^\d{3,}(?=_)
CodeFuller
  • 30,317
  • 3
  • 63
  • 79