My ultimate goal is to make a script to return a few pair of sentences from a TMX (Translation Memory Exchange) file. The file is from http://opus.nlpl.eu/OpenSubtitles2018.php and is about 2.1G.
I have tried reading it using the tmxfile
module
from translate.storage.tmx import tmxfile
with open("da-en.tmx", 'rb') as fin:
tmx_file = tmxfile(fin, 'da', 'en')
but seems it is not loading meaning endless waiting . I also tried a software called Stingray but as soon as I import the tmx file, it crashes.
I wonder what is the best strategy to achieve the goal ? I don't mind using AWK, Grep or other dedicated text parsing tools.