5

I am trying to download all the data package for the nltk. But it is always failing while trying to download framenet_v15. It simply hangs there.

Tried multiple times from the same machine. Each time almost left for 30 mins and once more than one hour. Tried to replace the source server to google svn, but downloader gave an error.

Unfortunately, I don't have any other information. Is there way to figure what the problem is? Or is there any alternate source from where I can download the nltk data?

Thanks.

Edit:

finally downloaded with wget -c, it took lot of retries before finally completing the download.

Some observations

  1. After some some amount of data is downloaded, the connection goes to freeze. The server is not reachable by ping.
  2. The downloadable data is shared on the same server which hosts nltk.org.
  3. Whenever the download is freezing the site is also not available ( not the nltk.org) but other sites for which caching is not enabled. Obviously server is not able to serve.
  4. May be there is a resource leak, which is manifesting for this download.
  5. There might be a process restart, which makes the server available after some time( ~2 mins).
  6. Why large downloads don't use torrent? Just another option for downloads.
sophros
  • 14,672
  • 11
  • 46
  • 75
Biswanath
  • 9,075
  • 12
  • 44
  • 58
  • How long have you left it? Have you tried from multiple times or from another machine? – Spaceghost Jan 14 '14 at 01:01
  • Added the information you asked to the question. – Biswanath Jan 14 '14 at 12:34
  • Can you add the commands you used? – Spaceghost Jan 14 '14 at 15:29
  • Give the link below a try. It worked for me. – e h Jan 16 '14 at 11:33
  • Still the download is failing, randomly hangs on certain size. Just curios, did you had a very hi speed connection while downloading ? – Biswanath Jan 16 '14 at 12:47
  • Hello, no. But when running nltk.download() it took a few tries. However, the direct link below works every time I have tried it. You can try requesting the data directly from the Frame Net project (I'll add the link to my answer below). – e h Jan 16 '14 at 14:45
  • @emh, Well that was the what happened, when I was working with the downloads. May be something else is at play over here. BTW, not only framenet data but some of the other large files have the same issue. – Biswanath Jan 17 '14 at 03:45

4 Answers4

3

EDIT: Here is a direct link that will allow you to request the data from the Frame Net project: https://framenet.icsi.berkeley.edu/fndrupal/framenet_request_data

When I downloaded the NLTK data I had to run the downloader several times since it kept hanging.

Alternatively here is a list of the individual files: http://nltk.org/nltk_data/

I just downloaded framenet_v15 from this link: http://nltk.github.com/nltk_data/packages/corpora/framenet_v15.zip

Also, see this question for more discussions on this: Installing natural language toolkit data

Community
  • 1
  • 1
e h
  • 8,435
  • 7
  • 40
  • 58
  • The link you provided redirects to the nltk.org page. So in a way this is not alternate anymore ? Do you have any alternate source from where I can download the data. – Biswanath Jan 16 '14 at 12:47
3

I tried downloading by

import nltk

nltk.download('all')

And it worked for me

Ayush Bairagi
  • 254
  • 4
  • 4
0

FWIW I had this same problem with framenet v15. Restarting nltk.download() and just downloading the framenet package by itself from the corpora seemed to work for me. After I had that I was able to complete downloading everything else from the collections tab.

aferriss
  • 914
  • 9
  • 20
0

Assuming you are on Unix due to use of wget, I recommend creating a package for the NLTK data pack that you want (i.e. framenet).

I recently created nltk-data-punkt.spec for similar reasons, and it can be used as an example for other data packs.

John Vandenberg
  • 474
  • 6
  • 16