I am implementing a simple search engine that searches in a source data which is the 12k pieces of written-news of different topics. We assume that the search engine just have the ability to respond to:
- Phrase Queries that come with inside of the double-quotation marks
- Not Queries that come after the exclamation mark
- And Queries which come without any specific mark
For instance this query:
"global warming" worldwide !USA
is a query that should contain:
- the Phrase Query: "global warming"
- the And Query: worldwide
- not contain the Not Query: USA
The point is that the Phrase Query should come continuously in a unique piece with no other words between the words! My problem is with splitting these three types of queries using string operation of Python or re library.
I have write this piece of code for extracting Phrase Queries and Not Queries. but I have not handled to extract the And queries yet!
query = input()
phrase_query = re.findall(r'"([^"]*)"', query)
not_query = re.findall(r'!(\w+)', query)
print(phrase_query)
print(not_query)
For the input of:
"global warming" worldwide !USA
the above code returns:
['global warming']
['USA']
Which is great. However I can not extract the And Query. How can I extract the And Query: worldwide in a different list?