tl;dr
If the environment variable NLTK_DATA
is set and the directory exists, it is used as default download directory.
Explanation
In the data module the environment variable NLTK_DATA
is used as first entry when filling the data search path.
If download_dir
is not specified as parameter when calling nltk.download() the method default_download_dir() determines the download directory.
Example: Create new default data directory
One would like to use /usr/local/share/nltk_data
as default data directory.
Create the data directory and add NLTK_DATA
to your shell profile.
$ mkdir /usr/local/share/nltk_data
$ echo "export NLTK_DATA=/usr/local/share/nltk_data" >> ~/.bashrc
$ source ~/.bashrc
$ echo $NLTK_DATA
/usr/local/share/nltk_data
Now nltk
uses /usr/local/share/nltk_data
as defined in NLTK_DATA
.
$ python
Python 3.10.6 (main, Sep 5 2022, 11:08:58) [Clang 11.0.0 (clang-1100.0.33.17)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download('gutenberg')
[nltk_data] Downloading package gutenberg to
[nltk_data] /usr/local/share/nltk_data...
[nltk_data] Unzipping corpora/gutenberg.zip.
True
Example: Switch default data directory
The current data directory is ~/nltk_data
and one would like to use the directory /usr/local/share/nltk_data
instead.
Move the data directory and add NLTK_DATA
to your shell profile.
$ mv ~/nltk_data /usr/local/share/nltk_data
$ echo "export NLTK_DATA=/usr/local/share/nltk_data" >> ~/.bashrc
$ source ~/.bashrc
$ echo $NLTK_DATA
/usr/local/share/nltk_data
Now nltk
uses /usr/local/share/nltk_data
as defined in NLTK_DATA
.
$ python
Python 3.10.6 (main, Sep 5 2022, 11:08:58) [Clang 11.0.0 (clang-1100.0.33.17)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download('gutenberg')
[nltk_data] Downloading package gutenberg to
[nltk_data] /usr/local/share/nltk_data...
[nltk_data] Package gutenberg is already up-to-date!
True