Collecting data for machine translation

Question

I am interested in trying to make a machine translation for language accents and is curious for methods avaialable to collect data or how to make your own corpus with unlimited resource. Any good reference i could refer to or ideas?

for this project i haven't really started. i only tried compiling according to examples online and most of them used the corpus that was available online. — luzzi, Sep 24 '20 at 10:18

score 1 · Answer 1 · answered Jan 15 '21 at 00:15

There are lots of open corpora you may wish to look at, many of which are collated here on the The Open Parallel Corpus (OPUS), to seed some data for your exercise.

In terms of building and collecting your own, you could consider Amazon Mechanical Turk, or doing generation with something like Snorkel.

score -2 · Answer 2 · answered Sep 23 '20 at 08:01

-2

What kind of realization you need? if it just shell programm, it is easy? if you want GUI(Tkinter) or WEB (Djano) app?

answered Sep 23 '20 at 08:01

Сергей Кащишин

11

Collecting data for machine translation

2 Answers2