-1

I am getting 17 types of file having following format:

85_*_p.dat
88_*_p.dat
32_*_p.dat
40_*_p.dat
41_*_p.dat
70_*_p.dat
22_*_p.dat
23_*_p.dat
46_*_p.dat
24_*_p.dat
25_*_p.dat
26_*_p.dat
52_*_p.dat
123_*_p.dat
28_*_p.dat
29_*_p.dat
35_*_p.dat

Where * is any GUID like "40_20200313_0cd6963f-bf5b-4eb0-b310-255a23ed778e_p.dat". The numbers like 85, 88 etc. are interface no. Underscores as well as "_p.dat" is common for all files.

How to write a regular expression which includes all of above files?

Tried but not worked:

/[22][23][24][25][26][28][29][32][35][40][41][46][52][70][85][88][123]_(?:.*)_p.dat/

Also tried:

\d[22|23|24|25|26|28|29|32|35|40|41|46|52|70|85|88|123]_(?:.*)_p.dat

This is errorneous as if i add 123 then it also picks 23

Also tried:

(22|23|123)_(?:.*)_p.dat

It is giving two results with normal and group 1

Not sure how to manage this

Note: Apache Camel has facility to read SFTP File through Regular Expressions. I wanted to create regular expressions for all above files. But need the same for Java.

fatherazrael
  • 5,511
  • 16
  • 71
  • 155
  • @Matt: The one i tried is following: /[22][23][24][25][26][28][29][32][35][40][41][46][52][70][85][88][123]_(?:.*)_p.dat/ . updated the same in description too – fatherazrael Mar 18 '20 at 02:12

2 Answers2

1

Your syntax is not correct. This regular expression matches all of your filenames:

[0-9]+_[0-9a-z_-]+_p\.dat

I'll pull it apart and explain.

[0-9]+ matches one or more digits, the number at the start of each filename (i.e. 22). You could make it more specific like you have in your example and match (22|23|24) which reads like 22 or 23 or 24.

_ matches the underscore

[0-9a-z_-]+ matches the "GUID" part, which can be one or more numbers, lower case letters, underscores and hyphens

_p\.dat matches an underscore, the letter p, a period (notice that this is escaped with a \ because . is a special regular expression character) and the dat suffix at the end

I use regex101 to play around with regular expressions, give it a go, it has a nice help section too.

Matt
  • 3,677
  • 1
  • 14
  • 24
0

You could match 1 or more digits with a repeated pattern that starts with matching either - or _ to prevent matching consecutive __--

^\d+_[a-f0-9]+(?:[_-][a-f0-9]+)+_p\.dat$

Explanation

  • ^ Start of string
  • \d+_ Match 1+ digits and match _
  • [a-f0-9]+ Match 1+ times a-f or 0-9
  • (?: Non capture group
    • [_-][a-f0-9]+ Match either _ or - and 1+ times a-f or 0-9
  • )+ Close non capture group and repeat 1+ times
  • _p\.dat Match _p.dat
  • $ End of string

Regex demo

Note that in Java you have to double escape the backslash:

String regex = "^\\d+_[a-f0-9]+(?:[_-][a-f0-9]+)+_p\\.dat$";

To match those numbers exactly you could use an alternation in combination with character classes to match the numbers and shorten the pattern a bit:

^(?:2[2-689]|3[25]|4[016]|52|70|8[58]|123)_[a-f0-9]+(?:[_-][a-f0-9]+)+_p\.dat$

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70