0

I've started to play with snowball, here's the basic code I'm using, and I'm stuck with non-ascii letter č. From what I see in the produced files, it has no special handling.

The .sbl in a nutshell is:

externals ( stem )

stringdef cv '{U+010D}' // č

define stem as (
  backwards (
    [substring] among (
      'le{cv}en' (<-'leko')
    )
  )
)

The .py is created with this command:

snowball -py sl.sbl -o out/sl

The test file:

#!/usr/bin/env python

from out.sl import Sl

stemmer=Sl()
print(stemmer.stemWords(['mlečen', 'mle{cv}en']))

Then {cv} remains unprocessed, and unicode č in the input is not handled:

(venv) pooh@dell ~/metagrocery/stemmer $ ./test.py
['mlečen', 'mleko']

Maybe I'm missing something obvious?

aikipooh
  • 137
  • 1
  • 19

0 Answers0