2

I have written some code which needs to use NLTK's punkt. I have included nltk in the requirements.txt and in the setup.py. However, when I run the build of my project using GitHub actions, it fails with this error.

E       LookupError:   
E       **********************************************************************  
E         Resource punkt not found.  
E         Please use the NLTK Downloader to obtain the resource:  
E       
E         >>> import nltk  
E         >>> nltk.download('punkt') 

What is the standard way to tell GitHub actions that it needs 'punkt' without hard coding nltk.download('punkt') somewhere into the code? Should I add a line in the ci.yml file, and what is the best way to do it?

andrea
  • 482
  • 5
  • 22
  • 1
    I have found a way of fixing this by adding `echo -e "import nltk\nnltk.download('punkt')" | python3` to the `ci.yml` before running the tests with pytest. However, any more elegant solution is very welcome. – andrea Jun 06 '20 at 21:02
  • try to create nltk.txt file and include the `punkt`? – Darkknight Jun 07 '20 at 11:49
  • Also just used the `ci.yml` change that andrea proposed. Seems to be working :D – angerhang Jul 07 '20 at 23:29

2 Answers2

2

In the ci.yml file, adding the nltk.downloader commandline after importing dependencies defined in requirements.txt worked for me.

if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
python -m nltk.downloader punkt stopwords
Thomas Wright
  • 124
  • 2
  • 6
-1

For me this command does not work:

import nltk
nltk.download()

but I still have a solution to this problem, that worked for me.

You have to manually download the punkt file yourself, but while writing this the site is not working but I got you covered so you can download it from here:

https://web.archive.org/web/20230206063107/https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/tokenizers/punkt.zip

after downloading the file you have to go to

C:\Users\name\AppData\Roaming\

If the nltk_data folder does not exist then create one and go inside the folder and create an other folder named tokenizers and extract the punkt.zip file inside the tokenizers folder.

I hope this helps

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129