I am interested in trying to make a machine translation for language accents and is curious for methods avaialable to collect data or how to make your own corpus with unlimited resource. Any good reference i could refer to or ideas?
Asked
Active
Viewed 65 times
-1
-
Please show us what you have tried? – Grayrigel Sep 23 '20 at 08:04
-
for this project i haven't really started. i only tried compiling according to examples online and most of them used the corpus that was available online. – luzzi Sep 24 '20 at 10:18
2 Answers
1
There are lots of open corpora you may wish to look at, many of which are collated here on the The Open Parallel Corpus (OPUS), to seed some data for your exercise.
In terms of building and collecting your own, you could consider Amazon Mechanical Turk, or doing generation with something like Snorkel.

Dave Meikle
- 226
- 2
- 5
-2
What kind of realization you need? if it just shell programm, it is easy? if you want GUI(Tkinter) or WEB (Djano) app?