Regex alternative to negative lookahead

Question

I want to match all paths that include the keyword build unless they also contain .html

Here is a working regex that uses negative lookahead: https://regexr.com/4msck

I am using regex for path matching in unison which does not support negative lookahead. How can I replicate the functionality of the above regex without negative lookahead?

Will you please at least give some examples? Input strings, expected results, what you tried... For example, the regexes with lookaheads. Also, which tool are you using, which flavor of regex? — virolino, Oct 15 '19 at 13:06
You should not use a single regex then in the first place. Most probably you just need no regex then. — Wiktor Stribiżew, Oct 15 '19 at 13:07
Also, looking at the way you built the regex, you can accomplish this simply by using text search for each line. A lot simpler and a lot faster. — virolino, Oct 15 '19 at 13:09
@virolino Updated the question to show the tool I'm using, although the regexr link I posted has examples and the regex with lookahead — Adam Griffiths, Oct 15 '19 at 13:09
@virolino Unison doesn't support text search in its profile configurations. Only regex and a couple of helper functions (Path and Name) which under the hood also use regex — Adam Griffiths, Oct 15 '19 at 13:13

score 1 · Accepted Answer · answered Oct 15 '19 at 16:20

It is possible, but the resulting regex is pretty poor in terms of readability and maintainability.

http://regexr.com/4mst1

^(?:[^\.\n]|\.(?:$|[^h\n]|h(?:$|[^t\n]|t(?:$|[^m\n]|m(?:$|[^l\n])))))*build(?:[^\.\n]|\.(?:$|[^h\n]|h(?:$|[^t\n]|t(?:$|[^m\n]|m(?:$|[^l\n])))))*$

Explanation:

^ - start of string/line
(?:[^\.\n]|\.(?:$|[^h\n]|h(?:$|[^t\n]|t(?:$|[^m\n]|m(?:$|[^l\n])))))* - matches anything that does not contain .html
build - literally that string
(?:[^\.\n]|\.(?:$|[^h\n]|h(?:$|[^t\n]|t(?:$|[^m\n]|m(?:$|[^l\n])))))* - same as before
$ - end of string/line

virolino · Answer 2 · 2019-10-15T13:49:35.080

0

According to the manual, this should work. It is based on the comment: "I want to ignore all files in a build directory except for html files"

ignore = Regex .*build.*
ignorenot = Name {*.html}

I am not familiar with unison, so I must assume that you can specify the paths with more than 1 rule.

I have this expectation because of this statement in the manual:

There is also an ignorenot preference, which specifies a set of patterns for paths that should not be ignored, even if they match an ignore pattern.

edited Oct 15 '19 at 13:49

answered Oct 15 '19 at 13:24

virolino

2,073
5
21

This does not work since "If the root is a directory, Unison continues looking for updates in all the immediate children of the root. Again, if the name of some child matches an ignore pattern and does not match an ignorenot pattern, then this whole path including everything below it will be ignored. " Basically as soon as the build regex pattern is matched it doesn't bother to recurse further down that directory to discover it's html descendants – Adam Griffiths Oct 15 '19 at 13:36
In your edit, the rules seem to be exactly the opposite of what you described in the question. Or? – virolino Oct 15 '19 at 13:42
No. When I asked the question I abstracted away the unison part. If the regex matches then it causes the directory to be ignored. I want to ignore all files in a build directory except for html files. Therfore I need a regex that matches on any directory containing `build` unless it also contains `.html` – Adam Griffiths Oct 15 '19 at 13:44
Then please update the question. Also, the question is not really about regex, it seems, but about the configuration of `unison`. – virolino Oct 15 '19 at 13:46
If you abstract away the fact the regex is for unison then it is. An answer to the question that purely focuses on regex and provides an alternative to a negative lookahead would solve my problem – Adam Griffiths Oct 15 '19 at 13:51
Is the order of the statements important? i.e., use the `ignorenot` first, before `ignore`. – virolino Oct 15 '19 at 13:52
No it isn't all the statements get parsed before the directory tree is traversed. The issue is that it is a greedy algorithm and if a directory matches `build` and doesn ot match any ignorenot rules then it is immideately discarded and any files or directories below it are not traversed at all – Adam Griffiths Oct 15 '19 at 13:56

Regex alternative to negative lookahead

2 Answers2