2

I need to implement a function that, given as input a filename, returns a substring according to the specifications of a regular expression

Filenames are composed this way, I need to get the string in bold

Doc20191001119049_fotocontargasx_3962122_943000.jpg

Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg

Doc201910011214020_fotoesterna_ant_396024_947112.jpg

Doc201710071149010_foto_TargaMid_4007396_95010.jpg

I have currently implemented this:

Pattern rexExp = Pattern.compile("_[a-zA-Z0-9]+_");

But not work properly

Mattia
  • 1,057
  • 2
  • 17
  • 33

3 Answers3

3

Solution 1: Matching/extracting

You may capture \w+ pattern inside _s that are followed with [digits][_][digits][.][extension]:

Pattern rexExp = Pattern.compile("_(\\w+)_\\d+_\\d+\\.[^.]*$");

See the regex demo

Details

  • _ - an underscore
  • (\w+) - 1+ letters/digits/_
  • _ - an underscore
  • \d+ - 1+ digits
  • _\d+ - _ and 1+ digits
  • \. - a dot
  • [^.]* - 0+ chars other than .
  • $ - end of string.

Online Java demo:

String s = "Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg";
Pattern rexExp = Pattern.compile("_(\\w+)_\\d+_\\d+\\.[^.]*$");
Matcher matcher = rexExp.matcher(s);
if (matcher.find()){
    System.out.println(matcher.group(1)); 
} // => fotoAssicurazioneCartaceo

Solution 2: Trimming out unnecessary prefix/suffix

You may remove all from the start up to the first _ including it, and [digits][_][digits][.][extension] at the end:

.replaceAll("^[^_]*_|_\\d+_\\d+\\.[^.]*$", "")

See this regex demo

Details

  • ^[^_]*_ - start of string, 0+ chars other than _ and then _
  • | - or
  • _\d+_\d+\.[^.]*$ - _, 1+ digits, _, 1+ digits, . and then 0+ chars other than . to the end of the string.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • You need to use `replaceAll`, not `replaceFirst`; otherwise, you'll get `fotoAssicurazioneCartaceo_3962128_943000.jpg` for `Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg` instead of `fotoAssicurazioneCartaceo`. – Avi Oct 10 '19 at 14:49
1

To complement Wiktor's precise answer, here's a "quick-and-dirty" way of doing it that makes the following hacky assumption about your input: "Required string is only non-numbers, surrounded by numbers, and the input is always a valid filepath".

public static void main(String[] args) {
  String[] strs = {"Doc20191001119049_fotocontargasx_3962122_943000.jpg", "Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg", "Doc201910011214020_fotoesterna_ant_396024_947112.jpg", "Doc201710071149010_foto_TargaMid_4007396_95010.jpg"};
  var p = Pattern.compile("_([\\D_]+)_");
  for(var str : strs) {
    var m = p.matcher(str);
    if(m.find()) {
      System.out.println("found: "+m.group(1));
    }
  }
}

Output:

found: fotocontargasx
found: fotoAssicurazioneCartaceo
found: fotoesterna_ant
found: foto_TargaMid
Avi
  • 2,611
  • 1
  • 14
  • 26
0

Pattern: (?<=_).+(?=(_\d+){2}\.)

    final String s = "Doc20191001119049_fotocontargasx_3962122_943000.jpg\n"
        + "\n"
        + "Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg\n"
        + "\n"
        + "Doc201910011214020_fotoesterna_ant_396024_947112.jpg\n"
        + "\n"
        + "Doc201710071149010_foto_TargaMid_4007396_95010.jpg";
    Pattern pattern = Pattern.compile("(?<=_).+(?=(_\\d+){2}\\.)");
    Matcher matcher = pattern.matcher(s);
    List<String> allMatches = new ArrayList<>();

    while (matcher.find()) {
        allMatches.add(matcher.group());
    }

Output: [fotocontargasx, fotoAssicurazioneCartaceo, fotoesterna_ant, foto_TargaMid]

Alexey
  • 7,127
  • 9
  • 57
  • 94