4

I have an application in python that accepts a list of text (strings) that we want to use as search terms in Azure Cognitive Search. The search parameter needs to be a string, so if I have a list of words I can do something like:

words_to_search_list = ["toy", "durable"]
words_to_search_str = ' '.join(words_to_search_list)

and then pass words_to_search_str as the "search" parameter in Azure Search, and it can search for text that has "durable" or "toy".

"toy durable"

However, I am not sure how to handle situations where there are bigrams or trigrams in the words_to_search_list like here:

words_to_search_list = ["more toys", "free treats"]

In order to get back text from Azure that contains either "more toys" or "free treats" we'd need to pass the parameter like this:

"\"more toys\" \"free treats\""

Meaning the bigrams need to be in double quotes, but escaped. I started this:

words_to_search_str=""
for words in words_to_search_list:
    words_list=words.split()
    if len(words_list)>1:
        words_escaped='\\"'+ words + '\\"' 
        words_to_search_str+=words_escaped
    else:
        words_to_search_str+=words

But this makes words_to_search_str into the following:

'\\"more toys\\"\\"free treats\\"'

which is not what I want (the double backsplashes won't work).

Is there any way to take that list of strings and end up with one string, but where the bigrams are each in (escaped) double quotes?

Edit: I'd like to add that in the solution I have here, if you print it, you get what looks to be the right object (single backslashes, not double), but the actual object still seems to have the double backslashes and they don't give the same result when you pass into the search parameter...

Imu
  • 545
  • 5
  • 15

4 Answers4

1

This should do it if you are running 3.6+:

words_to_search_list = [
    "toy", "durable", "more toys", "free treats", "big durable toys"
]

words_to_search_str = '\"search\": \"' + ' '.join([
    f'\\"{word}\\"' if ' ' in word else word for word in words_to_search_list
]) + '\"'

print(words_to_search_str)

If not, try:

words_to_search_list = [
    "toy", "durable", "more toys", "free treats", "big durable toys"
]

words_to_search_str = '\"search\": \"' + ' '.join([
    '\\"{}\\"'.format(word) if ' ' in word else word for word in words_to_search_list
]) + '\"'

print(words_to_search_str)
Clade
  • 966
  • 1
  • 6
  • 14
  • Odd, I get the same output in Python 3.6, 3.7, and 3.8: "search": "toy durable \"more toys\" \"free treats\" \"big durable toys\"". Can you copy and paste the exact error message? – Clade Jan 29 '20 at 13:50
  • Ah, if you are running <3.6, f-strings might be the cause of that error. Please see the updated answer. – Clade Jan 29 '20 at 13:53
  • The second solution no longer gives an error but there seems to be a difference between what words_to_search_str looks like when you print it vs what it is. When you print it it looks correct but if you just look at what it is it still seems to have the double backslashes... I updated the question to make that more clear. Also "search" shouldn't be part of the string (also updated that to make it more clear in the question) – Imu Jan 29 '20 at 15:49
0

When you display a string it's basically how would enter the string as python syntax. The double backslash isn't really a double backslash, like when you wrote your python code you used the double back slash to indicate an actual backslash by escaping it, python is simply doing that. That's also the reason why the double quotes are not being escaped, it's showing the string in single quotes. I hope that was helpful

Philogy
  • 283
  • 1
  • 12
0

The following should give you that format:

words_to_search_list = ["more toys", "free treats"]
updated_words = ['\\"{}\\"'.format(words) for words in words_to_search_list]
words_to_search_str = '"{}"'.format(' '.join(updated_words))
print(words_to_search_str)
Thierry Lam
  • 45,304
  • 42
  • 117
  • 144
  • There seems to be a difference between what words_to_search_str looks like when you print it vs what it is. When you print it it looks correct but if you just look at what it is it still seems to have the double backslashes... I updated the question to make that more clear. – Imu Jan 29 '20 at 15:49
0

The problem arises from you escaping the \ : words_escaped='\\"'+ words + '\\"'

You should escape but the " like: words_escaped='\"'+ words + '\"'

That should produce the anticipated result

Prateek Dewan
  • 1,587
  • 3
  • 16
  • 29