I just started my first NLTK project and am confused about the proper setup. I need several resources like the Punkt Tokenizer and the maxent pos tagger. I myself downloaded them using the GUI nltk.download()
. For my collaborators I of course want that this things get downloaded automatically. I haven't found any idiomatic code for that in the docu.
Am I supposed to just put nltk.data.load('tokenizers/punkt/english.pickle')
and their like into the code? Is this going to download the resources every time the script is run? Am I to provide feedback to the user (i.e. my co-developers) of what is being downloaded and why this is taking so long? There MUST be gear out there that does the job, right? :)
//Edit To explify my question:
How do I test whether an nltk resource (like the Punkt Tokenizer) is already installed on the machine running my code, and install it if it is not?