In order to use their text analytics, Azure requires a json file/document that looks like this:
document = {
"documents" :[
{"id": "1", "language": "en", "text": "I had a wonderful experience! The rooms were wonderful and the staff was helpful."},
{"id": "2", "language": "en", "text": "I had a terrible time at the hotel. The staff was rude and the food was awful."},
{'id': '3', 'language': 'es', 'text': 'Los caminos que llevan hasta Monte Rainier son espectaculares y hermosos.'},
{'id': '4', 'language': 'es', 'text': 'La carretera estaba atascada. Había mucho tráfico el día de ayer.'}]}
The issue I am getting at the moment is that the last record id: 4
is causing this error:
b'{"code":"BadRequest","message":"Invalid request","innerError":{"code":"InvalidRequestBodyFormat","message":"Request body format is wrong.
Make sure the json request is serialized correctly and there are no null members."}}'
The formatting of the JSON is correct, it's straight from their site and it runs perfectly fine without the last record. I tested some more and then found out that the í
and á
are the ones throwing the error. To make sure, I even tested it out with English words like resumé or fiancé but still the same error. But that doesn't make sense since Spanish is one of the supported languages for the text analysis and the text language is even define as Spanish before it's processed.
So my question is, am I missing something before passing my data through Azure? Am I suppose to convert, changing the encoding, or remove those characters or is this something that Azure's API should be able to handle?
EDIT: A little more background, I followed the instructions provided on their site to set it up to work with python. It works perfectly except for what I mentioned.