0

Coding with Python and using azure cognitive services text to speech. I have Arabic text, and I want to generate the corresponding mp3 speech audio :

input_text="هذا المحتوى مجاني، لذلك لدعم القناة لمزيد من المحتوى المجاني، يرجى الاشتراك، مثل، مشاركة، تعليق"

  def generate_speech(self,language_id, input_text, outfile, token):
    url = "https://{}.tts.speech.microsoft.com/cognitiveservices/v1".format(self.azure_location)
    print("input_text:"+input_text)
    header = {
    'Authorization': 'Bearer '+str(token),
    'Content-Type': 'application/ssml+xml',
    'X-Microsoft-OutputFormat': 'audio-24khz-160kbitrate-mono-mp3'
        }
    data = "<speak version='1.0' xml:lang='ar-SY'>\
          <voice xml:lang='ar-SY' xml:gender='Male' name='ar-SY-LaithNeural'>\
            {}\
          </voice>\
          </speak>".format(input_text)
    try:
      response = requests.post(url, headers=header, data=data)
      response.raise_for_status()
      with open(outfile, "wb") as file:
        file.write(response.content)
      print(response)
      response.close()
    except Exception as e:  
      print("ERROR: ", e)        

I get the following error:

ERROR: 'latin-1' codec can't encode characters in position 127-129: Body ('هذا') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

Dharman
  • 30,962
  • 25
  • 85
  • 135
  • Can you make sure that the indentation of your code is correct? – BrokenBenchmark Jun 26 '22 at 01:13
  • Does this answer your question? ['latin-1' codec can't encode characters](https://stackoverflow.com/questions/65582001/latin-1-codec-cant-encode-characters) – Ecstasy Jun 27 '22 at 03:49
  • [str encoding from latin-1 to utf-8 arbitrarily](https://stackoverflow.com/questions/41030128/str-encoding-from-latin-1-to-utf-8-arbitrarily), ['latin-1' codec can't encode character](https://stackoverflow.com/questions/64769797/latin-1-codec-cant-encode-character-u2019), [How to fix "latin-1 codec can't encode characters in position" in requests](https://stackoverflow.com/questions/57298260/how-to-fix-latin-1-codec-cant-encode-characters-in-position-in-requests), and [UnicodeEncodeError: 'latin-1' codec can't encode characters](https://github.com/psf/requests/issues/1822) – Ecstasy Jun 27 '22 at 03:51
  • Hi @Babel8Business, did the suggested solution work for you? Do let me know if it solved your problem else share more details so I can troubleshoot or else do accept it for helping other community members. – Kartik Bhiwapurkar Jul 25 '22 at 04:47

1 Answers1

0

ERROR: 'latin-1' codec can't encode characters in position 127-129: Body ('هذا') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

Kindly note that you are providing the arabic string directly into a variable and asking to print the string, but python does not understand the arabic font or language, that's why it is returning this Unicode error. To print in arabic, we need to install arabic_reshaper module or you can define it as u"ذهب الطالب الى المدرسة".

You can refer this SO Thread answer mentioned below to print in Arabic: -

how to print Arabic text correctly in PYTHON

You can also follow the below article by @Divakar V for text to speech generation: -

https://www.divakar-verma.com/post/azure-cognitive-services-using-python-rest-api-text-to-speech

Kartik Bhiwapurkar
  • 4,550
  • 2
  • 4
  • 9