I came across the following regex at work. What does it do?
,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))
To understand it, I split it into following parts
,
= Match everything that has,
(?=
= Followed by(?:[^\"]*\"[^\"]*\")*
= Anything which does not match"
, followed by"
, followed by anything which does not match"
, followed by"
. For example, 1111"aaaaa"(?![^\"]*\")
= BUT not followed by anything that does not"
and matches"
In other words, match anything that has ,
followed by either 11111"111"
OR followed by ""
The use case where the above expression is used, simply for tokenizing a string, separated by ,
, but I am assuming the author built for some thing more generic.
Can anyone provide a more simpler explanation than above?
The above expression is used to assign expression to boost::regex().
UPDATE: Actually, it is searching for the "," commas, with the following constraint
It is okay that there are even number of " following the comma
BUT, It is NOT okay to have a single " following the comma
For example consider the string: a, "h,w", 23
The first "," is matched, because it has following even number of " ("h,w")
The second "," in between "h,w" is NOT matched because of the second expression (?![^\"]*\") which states that a "," should not be followed by a single "
Finally the last "," matches.
The final output would be 2 , ,