I am uploading JSON data content to an elasticsearch index using python http.client. I successfully achieve to put the data but I'm having a char issue. Once inserted, special chars like é
are outputed like é
.
Here is the code :
import http.client
connection = http.client.HTTPConnection(elastic_address)
headers = {"Content-type": "application/json", "Accept": "text/plain"}
connection.request('PUT', url=endpoint, headers = headers, body=json_data.encode('utf-8'))
I have noticed that if I change the special chars in the source JSON before sending it like é
replaced by \u00E9
, it's working fine. It may be because Elasticsearch uses another char encoding but according to this link, ES uses utf-8 as character coding.
I've also overviewed the client.py of the http.client package and it seems that the data are encoded in latin-1, see below :
def _encode(data, name='data'):
"""Call data.encode("latin-1") but show a better error message."""
try:
return data.encode("latin-1")
except UnicodeEncodeError as err:
raise UnicodeEncodeError(
err.encoding,
err.object,
err.start,
err.end,
"%s (%.20r) is not valid Latin-1. Use %s.encode('utf-8') "
"if you want to send it encoded in UTF-8." %
(name.title(), data[err.start:err.end], name)) from None
I'm not sure where the issue is, in the script? in the http.client package? in the Elasticsearch index settings?
Any idea?