Python Flask Application on IBM cloud/bluemix with Textblob library throwing exception - textblob.exceptions.MissingCorpusError

Question

I am trying to run a python flask application with some text analytics(using TextBlob) feature on IBM cloud/Bluemix.I get the following error after deploying the application via cf push command(see below).According to documentation on TextBlob site,this Exception is thrown when a user tries to use a feature that requires a dataset or model that the user does not have on their system.

error:
Error while running the app:
textblob.exceptions.MissingCorpusError
MissingCorpusError: 
Looks like you are missing some required data for this feature.

To download the necessary data, simply run

python -m textblob.download_corpora
or use the NLTK downloader to download the missing data: 
http://nltk.org/data.html
If this doesn't fix the problem, file an issue at 
https://github.com/sloria/TextBlob/issues.

Now my question is I have added Flask,Textblob and NLTK in my requirement.txt like shown below.Please suggest how can I run python -m textblob.download_corpora command to make this missing dataset/model available to bluemix environment.If not running command mentioned above,is there any other way we can make this work. Note:This app works perfectly on local system.

requirement.txt content:
Flask==0.12.2
cloudant==2.4.0
textblob==0.15.1
nltk==3.3

This is error/warning I get while the application is getting deployed through push command

        -----> Downloading NLTK corpora...
!     nltk.txt not found, not downloading any corpora

Edit asked by Henrik: When I run command python -m textblob.download_corpora below corporas are being downloaded on my system.I am mentioning the same list in the nltk.txt file

`[nltk_data] Downloading package brown to
 [nltk_data]     C:\Users\MohanaKrishnaV\AppData\Roaming\nltk_data...
 [nltk_data]   Package brown is already up-to-date!
 [nltk_data] Downloading package punkt to
 [nltk_data]     C:\Users\MohanaKrishnaV\AppData\Roaming\nltk_data...
 [nltk_data]   Package punkt is already up-to-date!
 [nltk_data] Downloading package wordnet to
 [nltk_data]     C:\Users\MohanaKrishnaV\AppData\Roaming\nltk_data...
 [nltk_data]   Package wordnet is already up-to-date!
 [nltk_data] Downloading package averaged_perceptron_tagger to
 [nltk_data]     C:\Users\MohanaKrishnaV\AppData\Roaming\nltk_data...
 [nltk_data]   Package averaged_perceptron_tagger is already up-to-
 [nltk_data]       date!
 [nltk_data] Downloading package conll2000 to
 [nltk_data]     C:\Users\MohanaKrishnaV\AppData\Roaming\nltk_data...
 [nltk_data]   Package conll2000 is already up-to-date!
 [nltk_data] Downloading package movie_reviews to
 [nltk_data]     C:\Users\MohanaKrishnaV\AppData\Roaming\nltk_data...
 [nltk_data]   Package movie_reviews is already up-to-date!
 Finished.

And this is how my nltk.txt looks like

brown wordnet
averaged_perceptron_tagger
brown
sentence_polarity
sentiwordnet
subjectivity
words
punkt
maxent_treebank_pos_tagger
movie_reviews
conll2000

I have added additional corpora in my nltk.txt like below ,hope that's not a problem

 sentence_polarity
 sentiwordnet
 subjectivity
 words

This is how the error log looks like:

   -------> Buildpack version 1.5.22
   -----> Installing pip-pop (0.1.1)
   Downloaded [https://buildpacks.cloudfoundry.org/dependencies/manual- 
   binaries/pip-pop/pip-pop-0.1.1-d410583a.tar.gz]
   -----> Installing pipenv (4.0.1)
   Downloaded [https://buildpacks.cloudfoundry.org/dependencies/manual- 
   binaries/pipenv/pipenv-4.0.1-148f753f.tar.gz]
    $ pip install -r requirements.txt
   You are using pip version 9.0.1, however version 10.0.1 is available.
   You should consider upgrading via the 'pip install --upgrade pip' command.
   You are using pip version 9.0.1, however version 10.0.1 is available.
   You should consider upgrading via the 'pip install --upgrade pip' command.
   -----> Downloading NLTK corpora...
   -----> Downloading NLTK packages: brown wordnet
   averaged_perceptron_tagger
   brown
   sentence_polarity
   sentiwordnet
   subjectivity
   words
   punkt
   maxent_treebank_pos_tagger
   movie_reviews
      [nltk_data] Downloading package brown to
      [nltk_data]     /tmp/contents525031002/deps/0/python/nltk_data...
      [nltk_data]   Package brown is already up-to-date!
      [nltk_data] Error loading wordnet : Package 'wordnet\r' not found in
      [nltk_data]     index
      Error installing package. Retry? [n/y/e]
    Traceback (most recent call last):
    File "/tmp/contents525031002/deps/0/python/lib/python2.7/runpy.py", line 
    174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
    File "/tmp/contents525031002/deps/0/python/lib/python2.7/runpy.py", line 
    72, in _run_code
    exec code in run_globals
    File "/tmp/contents525031002/deps/0/python/lib/python2.7/site- 
    packages/nltk/downloader.py", line 2272, in <module>
    halt_on_error=options.halt_on_error)
    File "/tmp/contents525031002/deps/0/python/lib/python2.7/site- 
    packages/nltk/downloader.py", line 681, in download
    choice = input().strip()
    EOFError: EOF when reading a line
    Exit status 0
    Staging complete
    Uploading droplet, build artifacts cache...
    Uploading build artifacts cache...
    Uploading droplet...
    Uploaded build artifacts cache (64.3M)
    Uploaded droplet (105.6M)
    Uploading complete
    Stopping instance 6cbf3cbc-aef1-4a73-a7ab-d562a606fe5b
    Destroying container
    Successfully destroyed container

This is how I push my app: cf login >> [I supply my login details] >>cf push

data_henrik · Accepted Answer · 2018-07-06T15:46:52.693

0

It seems that you do not have a nltk.txt in the root directory of your deployed app. The Cloud Foundry Python buildpacks have built-in support for NLTK. The text file holds information about which corpora need to be installed during deployment.

Sample content of a nltk.txt:

wordnet averaged_perceptron_tagger brown sentence_polarity

Make sure that it is a single line, no duplicates and no strange characters...

edited Jul 06 '18 at 15:46

answered Jul 05 '18 at 06:52

data_henrik

16,724
2
28
49

Hi Henrik,I followed your recommendation by adding nltk.txt in root folder.Also tried adding one of the dataset as mentioned in CF documentation "Brown wordnet" .This package got installed while I pushed the app.But the problem persisted,so started adding some more dataset especially the ones mentioned in download_corpora(brown,punkt,wordnet,conll2000,maxent_treebank_pos_tagger,movie_reviews) as I did not know which dataset would solve my problem .But the same error pops up.Pls help!!! – Mohan Krishna V Jul 06 '18 at 04:31
Which copora is your app using? Can you see from the logs that some corpora are downloaded? Does the error message change? Please add the command showing how you push the app. – data_henrik Jul 06 '18 at 05:29
Hi Henrik,I have editted my post with all the information asked.Please let me know if you need further details. – Mohan Krishna V Jul 06 '18 at 15:08
I added some more details. Your error indicates a control character (\r) in your file. Try to put everyting on one line, no duplicates. – data_henrik Jul 06 '18 at 15:47
Henrik,the above error vanished after putting everything on same line.Thank you so much for your help.I got this error for once "502 Bad Gateway: Registered endpoint failed to handle the request".I suspected this could be related to memory and doubled the memory and it started working now.Thank you so much again for your help :) – Mohan Krishna V Jul 06 '18 at 18:18

Python Flask Application on IBM cloud/bluemix with Textblob library throwing exception - textblob.exceptions.MissingCorpusError

1 Answers1