Readers are perhaps looking for an elegant, at least not hackish, solution to this problem. That was my objective as well, but, alas, this is the best I've been able to come up with.
Code
def convert(str)
subs = []
str.gsub(/"[^"]*"| *\| */) do |s|
if s.match?(/ *\| */)
'|'
else
subs << s
'*'
end
end.gsub(/ +/, ' AND ').
gsub(/[*|]/) { |s| s == '|' ? ' OR ' : subs.shift }
end
Examples
puts convert(%Q{(A | B) "C D" (E | "F G" | "H I J") ("K L" ("M N" | "O P")) Q R})
#-> (A OR B) AND "C D" AND (E OR "F G" OR "H I J") AND ("K L" AND ("M N" OR "O P")) AND Q AND R
puts convert(%Q{(A|B) C D (E| "F G" |"H I J") ("K L" ("M N" | "O P")) Q R})
#-> (A OR B) AND C AND D AND (E OR "F G" OR "H I J") AND ("K L" AND ("M N" OR "O P")) AND Q AND R
Notice that in this example there is no space before and/or after some pipes and in some places outside double-quoted strings there are multiple spaces.
puts convert(%Q{(Ant | Bat) Cat Dog (Emu | "Frog Gorilla" | "Hen Ibex Jackel") ("Khawla Lynx" ("Magpie Newt" | "Ocelot Penguin")) Quail Rabbit})
#-> (Ant OR Bat) AND Cat AND Dog AND (Emu OR "Frog Gorilla" OR "Hen Ibex Jackel") AND ("Khawla Lynx" AND ("Magpie Newt" OR "Ocelot Penguin")) AND Quail AND Rabbit
Here I've replaced the capital letters with words.
Explanation
To see how this works let
str = %Q{(A | B) "C D" (E | "F G" | "H I J") ("K L" ("M N" | "O P")) Q R}
#=> "(A | B) \"C D\" (E | \"F G\" | \"H I J\") (\"K L\" (\"M N\" | \"O P\")) Q R"
then
subs = []
str.gsub(/"[^"]*"| *\| */) do |s|
if s.match?(/ *\| */)
'|'
else
subs << s
'*'
end
end
#=> "(A|B) * (E|*|*) (* (*|*)) Q R"
subs
#=> ["\"C D\"", "\"F G\"", "\"H I J\"", "\"K L\"", "\"M N\"", "\"O P\""]
As you see, I have removed the spaces around pipes and replaced all quoted strings with asterisks, saving those strings in the array subs
, so that I can later replace the asterisks with their original values. The choice of an asterisk is of course arbitrary.
The regular expression reads, "match a double-quoted string of zero or more characters or a pipe ('|'
) optionally preceded and/or followed by spaces".
As a result of these substitutions, all remaining strings of spaces are to be replaced by ' AND '
:
s2 = s1.gsub(' +', ' AND ')
#=> "(A|B) AND * AND (E|*|*) AND (* AND (*|*)) AND Q AND R"
It remains to replace '|'
with ' OR '
and each asterisk by its original value:
s2.gsub(/[*|]/) { |s| s == '|' ? ' OR ' : subs.shift }
#=> "(A OR B) AND \"C D\" AND (E OR \"F G\" OR \"H I J\") AND (\"K L\" AND (\"M N\" OR \"O P\")) AND Q AND R"