perl6 Negating multiple words and permutations of their chars inside a regex

Question

What is the best way to perform, inside a regex, negation of multiple words and permutations of chars that make up those words?

For instance: I do not want

"zero dollar"
"roze dollar"
"eroz dollar"
"one dollar"
"noe dollar"
"oen dollar"

but I do want

"thousand dollar"
"million dollar"
"trillion dollar"

If I write

not m/ [one | zero] \s dollar /

it will not match permutations of chars, and the "not" function outside will make the regex match everything else like "big bang" without the "dollar" in the regex.

m/ <- [one] | [zero] > \s dollar/ # this is syntax error.

FWIW, the way to avoid that syntax error is by writing that as `/ <!after one | zero> \s dollar/` — , Mar 01 '17 at 18:32
There should be a new _tag_ category `perl6 regexen` so people can distinguish from PCRE 5 types. — , Mar 01 '17 at 21:41
@sln The `regex` tag's description says "all questions with this tag should also include a tag specifying the applicable programming language or tool." Aiui this solves the problem you raise (not just for PCRE regex vs non-PCRE regex but also one PCRE flavor vs another), provided questioners follow the admonition to add a lang/tool tag (or editors do it for them). I believe this is true for all Perl 6 regex questions. Perhaps [tag intersection searching](http://meta.stackexchange.com/questions/231693/better-support-for-search-by-both-intersection-and-union-of-multiple-tags) needs improvement? — raiph, Mar 02 '17 at 16:49
@raiph - I think there is some compatibility mode for Perl6 regex that enables Perl5. But, Perl5 style regex constructs permeate to %90 of other engines' syntax. That's why regex tag is mostly a default for Perl5 style. It's too big of a leap to have regex qualified with perl6, since it's mostly standalone in regex land. — , Mar 02 '17 at 17:24
@sln Fwiw I think the current approach and advice in the tag works reasonably well and I'd be surprised if you get consensus on supporting introduction of a Perl 6 specific regex tag. But maybe I'm missing something. Presumably meta is the right forum if you wish to push this issue further. Please point folk to my comment above as a counterpoint to your own view if you decide to push for this new tag elsewhere and then reply here again if there's agreement we should change to a separate tag. TIA. — raiph, Mar 03 '17 at 06:47

smls · Accepted Answer · 2017-03-02T09:51:49.860

Using a code assertion:

You could match any word, and then use a <!{ }> assertion to reject words that are permutations of "one" or "zero":

say "two dollar" ~~ / :s ^ (\w+) <!{ $0.comb.sort.join eq "eno" | "eorz" }> dollar $ /;

Using `before`/`after`:

Alternatively, you could pre-generate all permutations of the disallowed words, and then reject them using a <!before > or <!after > assertion in the regex:

my @disallowed = <one zero>.map(|*.comb.permutations)».join.unique;

say "two dollar" ~~ / :s ^ <!before @disallowed>\w+ dollar $ /;
say "two dollar" ~~ / :s ^ \w+<!after @disallowed> dollar $ /;

score 6 · Answer 2 · answered Mar 01 '17 at 18:17

Here's a solution that works well. It uses a helper-sub is-bad-word that compares the $needle (i.e. what it found in the target string) against the @badwords and if any matches, it'll return True.

Inside the regex itself, I've used a negative code-assertion that passes the (\w+) that was matched into the helper sub.

One important thing to point out: If you don't properly anchor the (\w+) to the beginning of a word (i chose beginning of the string this time) it will just skip ahead one character when it found a bad word and accept anyway (unless the bad word was only one character to begin with, like in a dollar). After all, zero is in your @badwords, but ero isn't.

Hope that helps!

my @badwords = <one zero yellow>;

my @parsefails = q:to/EOF/.lines;
    zero dollar
    roze dollar
    erzo dollar
    one dollar
    noe dollar
    oen dollar
    yellow dollar
    wolley dollar
    EOF

my @parsepasses = q:to/EOF/.lines;
    thousand dollar
    million dollar
    dog dollar
    top dollar
    meme dollar
    EOF

sub is-bad-word($needle) {
    return $needle.comb.sort eq any(@badwords).comb.sort
}

use Test;
plan @parsefails + @parsepasses;

for flat (@parsefails X False), (@parsepasses X True) -> $line, $should-pass {
    my $succ = so $line ~~ / ^ (\w+) \s <!{ is-bad-word($0.Str) }> 'dollar' /;
    ok $succ eqv $should-pass, "$line -> $should-pass";
}

done-testing;

of course, you may want to fold-case (`.fc`) both sides of the `eq` in is-bad-word if you're interested in also disallowing One dollar. — timotimo, Mar 01 '17 at 18:23
Thank you timotimo !! You have packed many concepts that I need to do more learning. And learn Perl6 I will; be with you the force may !! — lisprogtor, Mar 03 '17 at 06:03

perl6 Negating multiple words and permutations of their chars inside a regex

2 Answers2

Using a code assertion:

Using before/after:

Using `before`/`after`: