1

I'm trying to POS-tag some sentences in Italian with Apertium's tagger. While according to the Apertium GitHub page I am supposed to get as output also the surface form in addition to the morphological analysis, I only get the analysis. I want also the surface form. I cannot infer it since the tagger doesn't necessarily tag a single token, so I cannot simply tokenize the original sentence and loop over it or zip it with the tagger's output.

According to the GitHub page:

In [1]: import apertium
In [2]: tagger = apertium.Tagger('ita')
In [3]: tagger.tag('gatti').
Out[3]: [gatti/gatto<n><m><pl>]

What I got:

In [1]: import apertium
In [2]: tagger = apertium.Tagger('ita')
In [3]: tagger.tag('gatti') # 'gatti' is the surface form
Out[3]: [gatto<n><m><pl>]

How can I get the surface form? If I provided one token at a time this would not be a problem since I would know what the token is. But in a sentence I cannot know how the tagger creates chunks.

Elanor
  • 13
  • 3

1 Answers1

0

By default, when creating a tagger of language ita it looks for /usr/share/apertium/modes/ita-tagger.mode. This is a shell script that calls various apertium commands. The command for the Italian tagger script happens to be configured to not include surface commands (it's missing the -p option).

A quick and dirty solution is to just sudo vim /usr/share/apertium/modes/ita-tagger.mode (or sudo nano or whatever your editor is) and add -p to the end of the last command, so the file looks like

lt-proc -w '/usr/share/apertium/apertium-ita/ita.automorf.bin' | cg-proc '/usr/share/apertium/apertium-ita/ita.rlx.bin' | apertium-tagger -g $2 '/usr/share/apertium/apertium-ita/ita.prob' -p

and do tagger = apertium.Tagger('ita') again.


A sudo-less solution would be to copy the mode file, edit, and add it to the search path, see https://github.com/apertium/apertium-python#installing-more-modes-from-other-language-data

unhammer
  • 4,306
  • 2
  • 39
  • 52
  • 1
    I know I should not use comments to thank people but I'm exploding with happiness and this was my first question ever on stackoverflow. I was intimidated and actually not very hopeful to get an answer. I was so wrong. Thanks a bunch :) – Elanor Nov 19 '20 at 11:19
  • No problem =D Note that there's also the #apertium IRC channel on freenode if you have stuff that doesn't feel like it fits on stackoverflow :-) – unhammer Nov 19 '20 at 13:15