3

I am trying to train a model for text classification in VertexAI AutoML (Google Cloud) using documents in Spanish. I imported the documents as JSON lines and tried specifying the language of each document as follows:

{"textContent":"Esto está escrito en español","languageCode":"es-ES","classificationAnnotations":[{"displayName":"Class A"},{"displayName":"Class B"}]} 

According to the schema file in the Vertex AI documentation on how to prepare the training data, the line above should work. However I could not find a way to check whether the language was imported correctly, and if I export the dataset back the languageCode field has an empty string as value.

What is the correct way to specify language of a document while importing it into a dataset? Is there any way to check that the language was imported correctly?

Ricco D
  • 6,873
  • 1
  • 8
  • 18
jroled
  • 61
  • 5
  • 1
    This appears to be an unexpected behavior. I'll check further and provide an update afterwards. – Ricco D Dec 20 '21 at 09:21
  • 1
    Hi OP, the resolution might take longer than usual. There's no ETA for the fix but engineers are now aware of your concern. You can track the issue publicly on https://issuetracker.google.com/213808331 – Donnald Cucharo Jan 10 '22 at 03:40
  • 1
    As per checking https://issuetracker.google.com/213808331, the issue should now be resolved. – Ricco D Aug 16 '22 at 21:10

0 Answers0