3

I have a set of strings that I'd like to parse in MATLAB 2012 that all have the following format:

string-int-int-int-int-string

I'd like to pluck out the third integer (the rest are 'don't cares'), but I haven't used MATLAB in ages and need to refresh on regular expressions. I tried using the regular expression '(.*)-(.*)-(.*)-\d-(.*)' but no dice. I did check out the MATLAB regexp page, but wasn't able to figure out how to apply that information to this case.

Anyone know how I might get the desired result? If so, could you explain what the expression you're using is doing to get that result so that others might be able to apply the answer to their unique situation?

Thanks in advance!

user1205577
  • 2,388
  • 9
  • 35
  • 47

2 Answers2

5
^.*?-.*?-.*?-(\d+)-.*?-.*?$

OR

^(?:[^-]*?-){3}(\d+)(?:.*?)$

Group1 now contains your required data

Anirudha
  • 32,393
  • 7
  • 68
  • 89
  • I tried `regexp( mystring, '.*?-.*?-.*?-(\d+)-.*?-.*?' )`, but that seems to be plucking out the first int for some reason. What is the reasoning behind the expression you recommend here? i.e., what is each piece doing? – user1205577 Oct 26 '12 at 17:00
  • 1
    @user1205577 whenever you use round brackets..everything that is captured in that bracket is captured in the group..In this case i used (\d+) to capture all the digits you want – Anirudha Oct 26 '12 at 17:01
  • 2
    @user1205577 - Your original expression was "greedy", i.e. ``.*`` would "eat" _everything_, including dashes. His expression is "non-greedy", i.e. ``.*?``, "eating" _just enough_ to match the following dash. His regex should have worked, so you should post more of your code... something else is wrong here. – Andrew Cheong Oct 26 '12 at 17:04
  • That is the whole code. In the MATLAB command line I type the following: `myStr = 'XyzStr-1-2-1000-56789-IloveStackExchange.txt'` and then the regular expression above and it returns `1`. Any helpful tips? – user1205577 Oct 26 '12 at 17:09
  • @user1205577 try this regex: `^(?:[^-]*?-){3}(\d+)(?:.*?)$` – Anirudha Oct 26 '12 at 17:26
  • `regexp( mystring, '^(?:[^-]*?-){3}(\d+)(?:.*?)$' )` returns an empty array. What is the logic behind the approach you're using there? – user1205577 Oct 26 '12 at 17:31
  • @user1205577 the digit is captured in token 1..r u accessing token 1 – Anirudha Oct 26 '12 at 17:37
  • 1
    @user1205577 - The problem here is the return value. The `1` you saw was _not_ the first value, only coincidence. See here for what the return values to ``regexp`` really are: http://www.mathworks.com/help/matlab/ref/regexp.html. – Andrew Cheong Oct 26 '12 at 17:38
  • 1
    @user1205577 also u need to give thirp parameter to **regexp** as `token` given [here](http://www.mathworks.in/help/matlab/matlab_prog/regular-expressions.html#f0-56416) – Anirudha Oct 26 '12 at 17:39
  • No need for the lazy modifier in the `[^-]*?-` expression. (`[^-]*-` matches the same and is faster.) – ridgerunner Oct 27 '12 at 11:25
  • @ridgerunner hmm..i dnt know how actually regex does the work inside..i used it just for using it..:) – Anirudha Oct 27 '12 at 11:32
5
str = 'XyzStr-1-2-1000-56789-ILoveStackExchange.txt';

[tok] = regexp(str, '^.+?-.+?-.+?-(\d+?)-.+?-.+?', 'tokens');

tok{:}
ans = 
    '1000'

Update

Explanation, upon request.

  • ^ - "Anchor", or match beginning of string.
  • .+? - Wildcard match, one or more, non-greedy.
  • - - Literal dash/hyphen.
  • (\d+?) - Digits match, one or more, non-greedy, captured into a token.
Andrew Cheong
  • 29,362
  • 15
  • 90
  • 145
  • i was going to have the same thing..:) – Anirudha Oct 26 '12 at 17:46
  • @Fake.It.Til.U.Make.It - You had it first. I just wanted to give him the actual code and it was hard to do in the comment box. +1 to your answer. – Andrew Cheong Oct 26 '12 at 17:47
  • Very helpful - thanks! For future readers it may be useful to add some commentary about what each little sub section of the regular expression is doing. – user1205577 Oct 26 '12 at 17:52