I'm working on text tokenization and lemmatization using UDPipe models. I can complete the task itself by using !echo
commands or printing into a file, but I would like to generate a Python data structure to further process the output.
What works
Here is my working command:
!echo 'the text I'm processing' | ./udpipe --tokenize --tag './path/to/my/model'
Out:
Loading UDPipe model: done.
newdoc
newpar
sent_id = 1
text = прывітанне, сусвет
1 прывітанне прывітанне NOUN NN Animacy=Inan|Case=Nom|Gender=Neut|Number=Sing _ _ _ SpaceAfter=No
2 , , PUNCT PUNCT _ _ _ _ _
3 сусвет сусвет NOUN NN Animacy=Inan|Case=Nom|Gender=Masc|Number=Sing _ _ _ SpacesAfter=\n
This works for printing the output into a file:
!echo 'the text I'm processing' | ./udpipe --tokenize --tag './path/to/my/model' >> filename.txt
./udpipe
is the cloned repository of the package
What I tried (without success)
os.system()
import os
text = 'the text I'm processing'
cmd = "echo '{}' | ./udpipe --tokenize --tag './path/to/my/model'".format(text)
os.system(cmd)
Out: 0
subprocess.getoutput()
import subprocess
cmd = "'the text I'm processing' | ./udpipe --tokenize --tag './path/to/my/model'"
output = subprocess.getoutput(cmd, stdout=subprocess.PIPE, shell=True)
print(output)
TypeError: getoutput() got an unexpected keyword argument 'stdout'