0

I have a bunch of articles that are translated, which I want to use as training data for IBM Watson language translation. What is the correct way to use these articles for training? Do I use the whole article and its translation as an entry in the parallel corpus, or do I have to split the article into sentences and have its translation pair as an entry?

ralphearle
  • 1,696
  • 13
  • 18
user2968505
  • 435
  • 2
  • 7
  • 18

1 Answers1

2

You have two choices.

Either split up the text into phrase pairs with a from and to for each phrase, and create either a forced_glossary or a parallel_corpus.

Or send all the translated text as a single file to create a monolingual_corpus.

Detailed documentation is available at https://www.ibm.com/watson/developercloud/doc/language-translator/customizing.html#training and the API documentation is available at https://www.ibm.com/watson/developercloud/language-translator/api/v2/?curl#create-model

chughts
  • 4,210
  • 2
  • 14
  • 27