-1

As the question implies I have a code snippet, with QRegularExpression, which works. It does what it is supposed to do, causes no errors and everything is fine.

Why am I posting the question? Well everything that I found so far implies that my expression should not work, but..... it does.

The main point of my question lies in the \- escape sybmol.

I know know that it's not defined. And during compiling i get warning: unknown escape sequence: '\-'. And this warning is actually expected.

Now consider the following code snippet. Don't pay too much attention to the expression, it is russian, but unfortunatelly i noticed this strange thing on this expression.

I am not posting anything else because as stange as it sounds - it works as desired.

I actually want to understand why - considering i get the warning.

The expression is below.

//Capture russian endings
QRegularExpression RU_ENDINGS("([а-я\-]+[бвгджзклмнпрстфхчцшщ])([еиоы][й]|[аия][я]|[иую][ю]|[еиоы][е]|[аоеиы][м][иу]|[ое][г][о]|(?<!ост)и?[аеиоыя]м|ост[а-яё]{1,3}|(?<!остиям)(?>и|ь.?)|[ао]в|н[аеио]|с[ая]|[ео][вк]|[иы]х|[ие]ну|[иуя]т|(?<![аеёиоуыэюя]{2})[аеёоуыэюя]+|и{2})$", QRegularExpression::UseUnicodePropertiesOption | QRegularExpression::MultilineOption);

As i said i get desired behavior. In russian words with the symbol '-' in them, the symbol is actually is gobbled up by the [а-я\-]+ part. If it is not there - the - is not gobbled up.

Everything i found suggest it should not work, but it does.

UPDATE

In the suggested duplicate Regex did not work.

My question clearly states that my regex works, I just could not figure out why it did work as desired, considering the warning I got during compilation. All the provided code was used as it is and worked.

More to the point the question has nothing to do with std::regex, also a correct answer was already given below to the question with the correct explanation.

The question might be a duplicate, but it certainly is not the duplicate of the suggested question.

  • You probably meant to write `\\-` instead of `\-`. The first escape gets evaluated by the C++ compiler (thus the warning) and the second one by `QRegularExpression` – perivesta Jan 30 '20 at 11:08
  • Nope. I meant what I ment. And it actually works. Which is a bit strange though. I know it should by all acconts be `\\-`. The question is - why does it work =). – Eugene Anisiutkin Jan 30 '20 at 11:10
  • 1
    In that case, it's implementation defined. My guess is the compiler you're using treats is as if it were `\\-` – perivesta Jan 30 '20 at 11:22
  • I have reaon to belive other wise - mainly because i have a `\\1y`. But maybe i understood everything i read wrong. As far as I get it `\\1y` evaluates to `\1y`. This was taken from the replacement part – Eugene Anisiutkin Jan 30 '20 at 11:28
  • For such regular expressions you should consider a [raw string literal](https://en.cppreference.com/w/cpp/language/string_literal) (C++11). – Brandin Jan 30 '20 at 14:05
  • @EugeneAnisiutkin I mean if you have a non-trivial regular expression. In that case a raw string literal will probably make this a lot easier. – Brandin Jan 30 '20 at 15:43

1 Answers1

0

The compiler doesn't know the escape sequence \-. So it just puts a simple - in the string and issues a warning.

Your regex engine thus sees [а-я-]. And the way regex character groups work, a - at the very end of the group is not special, i.e. there is no difference between [а-я\-] and [а-я-].

Thus, the expression works as you want it to.

You can try this out for yourself by making a small program that compares the results for these two expressions. I.e.

QRegularExpression escaped("[a-z\\-]");
QRegularExpression bad_escaped("[a-z\-]");
QRegularExpression unescaped("[a-z-]");

Match these three against a few test strings, in particular the string "-", and you'll find that they all behave the same. Except for the compiler warning of course.

Sebastian Redl
  • 69,373
  • 8
  • 123
  • 157
  • Oh damn. I have not thought of that. Considering this I really should put `\\-`. It will be better i think for the future workings with regex. Better to not forget escaping everything correctly – Eugene Anisiutkin Jan 30 '20 at 11:40