How to specify document language while importing a dataset in Google Cloud AutoML?

Asked Dec 17 '21 at 18:58

Active Apr 07 '23 at 21:53

Viewed 114 times

I am trying to train a model for text classification in VertexAI AutoML (Google Cloud) using documents in Spanish. I imported the documents as JSON lines and tried specifying the language of each document as follows:

{"textContent":"Esto está escrito en español","languageCode":"es-ES","classificationAnnotations":[{"displayName":"Class A"},{"displayName":"Class B"}]}

According to the schema file in the Vertex AI documentation on how to prepare the training data, the line above should work. However I could not find a way to check whether the language was imported correctly, and if I export the dataset back the languageCode field has an empty string as value.

What is the correct way to specify language of a document while importing it into a dataset? Is there any way to check that the language was imported correctly?

edited Dec 20 '21 at 03:35

Ricco D

6,873
1
8
18

asked Dec 17 '21 at 18:58

jroled

1

This appears to be an unexpected behavior. I'll check further and provide an update afterwards. – Ricco D Dec 20 '21 at 09:21
1

Hi OP, the resolution might take longer than usual. There's no ETA for the fix but engineers are now aware of your concern. You can track the issue publicly on https://issuetracker.google.com/213808331 – Donnald Cucharo Jan 10 '22 at 03:40
1

As per checking https://issuetracker.google.com/213808331, the issue should now be resolved. – Ricco D Aug 16 '22 at 21:10

How to specify document language while importing a dataset in Google Cloud AutoML?

0 Answers0