I need to understand for which languages the tokenization in NLTK is possible. I think I need to set the language like this:
import nltk.data
lang = "WHATEVER_LANGUAGE"
tokenizer = nltk.data.load('nltk:tokenizers/punkt/'+lang+'.pickle')
text = "something in some specified whatever language"
tokenizer.tokenize(text)
I need to understand for which languages I can use this, but I couldn't find any information on the nltk documentation.