0

Using regular expressions, is there a concise, elegant and short way to select the last word occuring just before the second semicolon in this list, and also include non-English characters and hyphens? I've been putting it through regexr.com, but can't seem for the life of it to come up with any real solution.

1;Bjönæå Frælåøn Boøf;Kjrvad 19;
2;Vrönæå Kræ-êlèn;Ojrvøad 3;

Selection:

Boøf
Kræ-êlèn
Lasse
  • 77
  • 1
  • 10

1 Answers1

1

This regex matches the word before the second last semicolon:

[\p{L}-]+(?=;[^;]*;$)

See live demo working with your sample input.

The last term is a look ahead that assets the match is followed by a semicolon, some non-semicolons, then a terminating semicolon.

The character class is the POSIX expression for any "letter" character (which includes characters from all languages) and the hyphen (which doesn't need escaping when it appears last in a character class).

Using a look ahead makes matching the second last semicolon possible, but look behinds may not be variable length so matching the second semicolon is far harder and requires groups.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • Thank you for your quick reply. I ran the expression through Sublime Text 3, but it seems like any foreign characters will make the expression not work quite right. Is that due to the regex engine in Sublime? How would the expression look if I were to add another three semicolon seperated values on the same lines? Thank you again. – Lasse Aug 05 '14 at 14:47
  • @lasseal try my recent edit, and check out the live demo link. Regarding the extra input, that would be a different question, solvable but not as easily. Better to ask another question, because this question is clarified as 3 semicolons only. – Bohemian Aug 05 '14 at 14:50
  • Great! Thank you. It's still not working in Sublime though (is it running a different regex engine?), so running it through rubular.com will have to work for now. Still, thank you, great educational answer and solution. – Lasse Aug 05 '14 at 14:59
  • I am not familiar with sublime, but you may find there's a way to enable POSIX compatibility. – Bohemian Aug 05 '14 at 15:01
  • After a bit of research is seems like SublimeText3's snippets and regex search is using Boost.regex, which default behavior is Perl regular expressions, if anyone is puzzled why the POSIX answer above is not working for them – Lasse Aug 05 '14 at 15:52