1

I used this command:

"gcloud ml language analyse-syntax --language=pt-br --content="Capítulo"

and get this error:

ERROR: (gcloud.ml.language.analyze-syntax) Failed to read command line argument [--content=Cap\xedtulo] because it does not appear to be valid 7-bit ASCII.

gcloud ml language to be analyze-syntax --content=Cap\sedtulo ^invalid character

But, if I use the demo tool (https://cloud.google.com/natural-language/?hl=pt-br), I get the correct response.

Why? How to use Google Natural Language with accented languages like portuguese or spanish?

Tudormi
  • 1,092
  • 7
  • 18

1 Answers1

0

Disclaimer: I am with Google Cloud Platform Support.

Apparently this might be an internal issue and that's why I will raise it with the proper investigative team. I will post a comment to my answer, linking the proper Google Public Issue Tracker page.

Why?

You can get more information about this error if you run the following command:

gcloud ml language analyze-syntax --content-file=analyze_test.txt --verbosity=debug 

where analyze_test.txt contains

Capítulo

The thrown error:

File "/google/google-cloud-sdk/lib/third_party/apitools/base/protorpclite/messages.py", line 1541, in validate_element
    raise validation_error
ValidationError: Field content encountered non-ASCII string 'Cap\xc3\xadtulo\n': 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)

suggests that somewhere along the way, encoding specification is lacking when the gcloud ml communicates with this library. Missing the encoding, the library protects itself against unknown characters (other than 7-bit ASCII characters).


As the documentation suggests, one should use this command for experiments, or for extremely short text. In production, or in an application one should use i.e. the APIs.

Tudormi
  • 1,092
  • 7
  • 18