4

I am using nltk_tokenize in an django app . To do the same I need to do nltk data download so that I can use it for stemming . I am deploying the django app on cloud through Elastic beanstalk .

Right now I have included

nltk.download('punkt') 

in the my views so that the required data gets downloaded . But I am getting following error

[Errno 2] No such file or directory: '/home/wsgi/nltk_data'

What is the correct way to do so?

pankaj jha
  • 299
  • 5
  • 15

3 Answers3

6

I am not sure what nltk_tokenize is really, but your app is running on Elastic Beanstalk by wsgi user. This is a user that doesn't have a home directory. You need to specify the path to somewhere that exists like /opt/python/current/app (your app's directory in Elastic beanstalk), or /tmp/ or what now would make sense.

Edit: Corrected directory after comment.

Gustaf
  • 1,299
  • 8
  • 16
  • 2
    Worth mentioning, Elastic Beanstalk's app directory is /opt/python/current/app, not the other way around. Also, if you download to there, it'll be wiped on your next deployment. – Taz Sep 05 '16 at 12:57
  • 1
    Also it is important to add the new path to nltk path: nltk.data.path.append( download_dir ) – Rafael Larios Nov 15 '19 at 20:59
2

I achieved it by adding nltk_data files to my s3 bucket and then copying them from my s3 bucket to server by creating an eb extensions file with following command:

commands:
  01_copy_nltk_data:
    command: aws s3 cp s3://my_s3_bucket/nltk_data /usr/local/share/nltk_data --recursive 

After that, I added NLTK_DATA environment variable in my python script and pointed it to the location of nltk_data on server.

os.environ['NLTK_DATA'] = "/usr/local/share/nltk_data"

prafi
  • 920
  • 9
  • 11
  • After doing this deployments failed. In health page I got an error stating that "Application update failed at 2019-05-27T15:56:17Z with exit status 1 and error: command 01_copy_nltk_data in .ebextensions/s3.config failed. fatal error: An error occurred (AccessDenied) when calling the ListObjects operation: Access Denied." – Lecromine May 27 '19 at 15:59
0

You can use this block code:

import nltk
try:
    nltk.download('punkt', download_dir='/opt/python/current/app')
except:
    nltk.download('punkt')
from nltk.sentiment.vader import SentimentIntensityAnalyzer as SIA

What we are trying to do here is import nltk and then we are trying to download 'punkt' in app directory of Elastic beanstalk app since this is a wsgi app, so wsgi user doesn't have access to home directory.

Khushhal
  • 645
  • 11
  • 18