-1

I need to do a complex if-then-else with five preferential options. Suppose I first want to match abc but if it's not matched then match a.c, then if it's not matched def, then %#@, then 1z;.

Can I nest the if-thens or how else would it be accomplished? I've never used if-thens before.

For instance, in the string 1z;%#@defarcabcaqcdef%#@1z; I would like the output abc.

In the string 1z;%#@defarcabaqcdef%#@1z; I would like the output arc.

In the string 1z;%#@defacabacdef%#@1z; I would like the output def.

In the string 1z;#@deacabacdf%#@1z; I would like the output %#@.

In the string foo;%@dfaabaef#@1z;barbbbaarr3 I would like the output 1z;.

logan7
  • 37
  • 5
  • Can you not just do `abc|a.c|def|%#@|1z;`? – ctwheels Feb 22 '18 at 19:06
  • Doesn't alternation match the first one found? I need it to go through the entire thing looking for abc only first, then the next and so on. – logan7 Feb 22 '18 at 19:08
  • It's not clear what you're trying to accomplish. I edited my comment to change `\.` to `.` as per your reply. Maybe present us with sample strings and their expected outputs? – ctwheels Feb 22 '18 at 19:09
  • I'm trying to only match `abc`, but if it's not found at all, then try to only match `a.c`, and if it's not found at all, then try to only match `def`, and so on. – logan7 Feb 22 '18 at 19:11
  • Ya that's what the regex I presented does, not quite sure I'm understanding your needs – ctwheels Feb 22 '18 at 19:13
  • Not sure why the second last and last entries shouldn't return `abc` since that has priority? – ctwheels Feb 22 '18 at 19:18
  • I'm sorry, I'm very new to this. I thought alternation with `|` tries all the possibilities at the same time and returns the first one found. Is this not so? For instance, in the string `1z;%#@defarcabcaqcdef%#@1z;` with your alternation suggestion, would it return `abc` or `1z;`? – logan7 Feb 22 '18 at 19:19
  • It depends what you're looking for. Also, which language are you using? – ctwheels Feb 22 '18 at 19:20
  • On the second last and last example, why aren't they supposed to return `arc`? – ctwheels Feb 22 '18 at 19:21
  • I'm using ICU regular expressions in Keyboard Maestro. I edited. The second example should return `arc`. The last should now return `1z;`. – logan7 Feb 22 '18 at 19:21
  • The last one now matches `aqc` – ctwheels Feb 22 '18 at 19:24
  • Sorry, I was trying to leave as many close-but-not-exact examples in the strings to best illustrate when each should be found but obviously I missed a few exact matches in my haste. – logan7 Feb 22 '18 at 19:26
  • Now the last one matches `def` and the second last one `arc`. I assume I'm doing this correctly though if that's the case? – ctwheels Feb 22 '18 at 19:26
  • Try `(?m)^(?:.*?(abc).*|.*?(a.c).*|.*?(def).*|.*?(%#@).*|.*?(1z;).*)$`. Let me know if that works and if it does I'll post as an answer. Matches are in capture groups – ctwheels Feb 22 '18 at 19:28
  • This regex does what you need `(?m)^(?:(?=(.*?abc|.*?a.c|.*?def|.*?%\#@|.*?1z;)))\1` however, it doesn't match like you say it should. 1z;%#@de`arc`abaqcdf%#@1z; matches before `%#@`. See this https://regex101.com/r/niDuo8/1 –  Feb 22 '18 at 19:28
  • @ctwheels That works, thanks! Though it'd be nice if it only gave one single output, it makes five separate outputs in Keyboard Maestro leaving the other four blank, but that's good enough for my needs. Thanks again! – logan7 Feb 22 '18 at 19:38
  • @sln Thanks for the link. When I tried it the regex seems to capture everything if any of the desired possibilities are included anywhere, which I think is what you were saying. – logan7 Feb 22 '18 at 19:40
  • Oh, here you go https://regex101.com/r/ipMSk1/1 Btw, this isn't a put your regex order in here site. –  Feb 22 '18 at 19:46
  • @logan7 I edited my regex to only output to a single capture group. You'll still get the match, but now at least you'll only have to look for one result (the first capture group) instead of multiple. – ctwheels Feb 22 '18 at 19:46
  • @sln It doesn't look like ICU regex supports branch reset: http://userguide.icu-project.org/strings/regexp, funny cause I thought the same thing at first "yay I get to use branch reset", but nope – ctwheels Feb 22 '18 at 19:47
  • Oh, well try it without branch reset or capture groups `(?m)^(?:.*?\Kabc|.*?\Ka.c|.*?\Kdef|.*?\K%\#@|.*?\K1z;)` https://regex101.com/r/iArCV2/1 And, what is ICU ? –  Feb 22 '18 at 19:54
  • @sln That one doesn't seem to work at all in Keyboard Maestro/ICU. Keyboard Maestro says ICU is very similar to PCRE. I'm not sure what you mean by "this isn't a put your regex order in here site". – logan7 Feb 22 '18 at 20:00
  • Still don't know what ICU is. I suggest you make an effort to show some attempt at solving your problem. –  Feb 22 '18 at 20:03
  • @sln Oh, I have made attempts to solve my problem. I'd never even heard of regular expressions until last week and I've given myself a crash course in all this. I also googled and searched here for an answer to this but any way I worded it only brought up answers to problems not close enough to solve this one. – logan7 Feb 22 '18 at 20:05
  • @sln [ICU's Regular Expressions package provides applications with the ability to apply regular expression matching to Unicode string data. The regular expression patterns and behavior are based on Perl's regular expressions. The C++ programming API for using ICU regular expressions is loosely based on the JDK 1.4 package java.util.regex, with some extensions to adapt it for use in a C++ environment. A plain C API is also provided.](http://userguide.icu-project.org/strings/regexp). It doesn't support branch reset or `\K` – ctwheels Feb 22 '18 at 20:11
  • It's not based on _Perl_ as it has no conditionals, (?|), \K, *verbs. In fact it's almost not like it at all. It's plain vanilla assertions, etc.. –  Feb 22 '18 at 20:13
  • @logan7 - Your new here, just a fyi. If you don't show an attempt at a regex you've tried, you'll be hard pressed to get any help. This is not a write me the code help board. –  Feb 22 '18 at 20:15
  • @sln I agree, but that's what the package information says. There's a full list of supported tokens below. It does support lookbehinds, named capture groups, inline modifiers, possessive matches, atomic matches, comments, Unicode character classes, POSIX syntax, and `\G` though. – ctwheels Feb 22 '18 at 20:16
  • But, nowadays that's almost vanilla .. ridiculous. ICU has it's package to compile into c++, but boost has a much better one that uses ICU Unicode library. see http://regexformat.com. In fact, I'm not sure ICU regex even honor it's own property's via its interface. –  Feb 22 '18 at 20:19
  • @sln I tried to write it using if-then-elses which I thought were what would be needed and it was just a big mess of nested if-then-elses that didn't work. That's why I titled my post on if-then-elses and why I didn't include my own attempt because I thought it'd be less confusing to just put what I was trying to accomplish and not the mess I tried. – logan7 Feb 22 '18 at 20:22
  • @logan7 - `If you tried if (regex1) else if (regex2) else if (...` you should show it, there no harm in that .. –  Feb 22 '18 at 20:24
  • @sln Yeah, that's basically what I tried a few different ways; I actually don't want to even mention how long I spent on it before I gave up and came here (including going back once again to regular-expression.info {my own go-to crash course textbook} and reading the if-then-else sections and some other possibilities a few times over). This is probably about the 30th hard regex roadblock I've come across in the last week (I'm deep in needing to make varied regexes now) and only the second time I needed to ask for help so far. I'll remember to show my attempts for any future questions here. – logan7 Feb 22 '18 at 20:32

1 Answers1

1

You need to force individual matching of each option and not put them together. Doing so as such: .*?(?:x|y|z) will match the first occurrence where any of the options are matched. Using that regex against a string, i.e. abczx will return z because that's the first match it found. To force prioritization you need to combine the logic of .*? and each option such that you get a regex resembling .*?x|.*?y|.*?z. It will try each option one by one until a match is found. So if x doesn't exist, it'll continue to the next option, etc.

See regex in use here

(?m)^(?:.*?(?=abc)|.*?(?=a.c)|.*?(?=def)|.*?(?=%#@)|.*?(?=1z;))(.{3})
  • (?m) Enables multiline mode so that ^ and $ match the start/end of each line
  • (?:.*?(?=abc)|.*?(?=a.c)|.*?(?=def)|.*?(?=%#@)|.*?(?=1z;)) Match either of the following options
    • .*?(?=abc) Match any character any number of times, but as few as possible, ensuring what follows is abc literally
    • .*?(?=a.c) Match any character any number of times, but as few as possible, ensuring what follows is a, any character, then c
    • .*?(?=def) Match any character any number of times, but as few as possible, ensuring what follows is def literally
    • .*?(?=%#@) Match any character any number of times, but as few as possible, ensuring what follows is %#@ literally
    • .*?(?=1z;) Match any character any number of times, but as few as possible, ensuring what follows is 1z; literally
  • (.{3}) Capture any character exactly 3 times into capture group 1

If the options vary in length, you'll have to capture in different groups as seen here:

(?m)^(?:.*?(abc)|.*?(a.c)|.*?(def)|.*?(%#@)|.*?(1z;))
ctwheels
  • 21,901
  • 9
  • 42
  • 77
  • Thanks! I accepted your answer. I'm too new to upvote and let it show but I did upvote you too. – logan7 Feb 22 '18 at 20:02
  • I'm trying to work through _why_ this works exactly. I can't work out the `(?:` part. It seems that is a non-capturing grouping. What I don't understand is why this makes the regex search one at a time and yet a simple alternation regex with `|` would just match the first of any found. I'm trying to work out why that's so for the answer and for your earlier regex `(?m)^(?:.*?(abc).*|.*?(a.c).*|.*?(def).*|.*?(%#@).*|.*?(1z;).*)$`. – logan7 Feb 22 '18 at 21:52
  • Also I'm trying to figure out a way to make the answer work with optional outputs of different lengths such as if one option were `a+*?c`. I've tried changing the last grouping of `(.{3})` to `(.*)` or `(.*?)` and similar but none of those work. I also fooled around with making the expression only look for matches such as `(?:(?=abc)|(?=a.c)|(?=def)|(?=%#@)|(?=1z;))(.+)` or `(?:(abc)|(a.c)|(def)|(%#@)|(1z;))(.+)` and some other things too but none are working. Your answer works for what I need; I'm just trying to work through the why and some other variations I may need very soon. – logan7 Feb 22 '18 at 21:56
  • @logan7 to answer your question as to *why* it's because the pattern `.*?(?:x|y)` uses `?` after a quantifier causing it to be lazy. That means that it'll match as few times as possible (and thus it will return the first option it finds). Separating it into `(?:.*?x|.*?y)` forces it to try each one individually, which basically is doing: *Match until the first `x`*. Oh there is none? Ok let's try the same for `y`... etc. – ctwheels Feb 22 '18 at 22:14
  • Thanks. That confirmed my suspicion on the length thing and saved me a lot of useless fiddling with that. And thanks for the explanation on the forcing individual searching; it's very helpful to understand that! – logan7 Feb 22 '18 at 22:18
  • @logan7 Added info to the top of my answer to make that^ more clear :) – ctwheels Feb 22 '18 at 22:25
  • Thanks! I think I understand it all pretty good now. I even worked out a sort of janky way to only have one output with varying length options... if it's possible to work out what might be after them: `(?:.*?(?=abc)|.*?(?=def)|.*?(?=ghi))(.*?)(?:\))` I just made that one fiddling around with how to just capture once if the options always had a `)` after them (I was using the regex string itself to search through hence the `)`). – logan7 Feb 22 '18 at 22:49