1

I get this error when I try to decompress wikipedia dump to use its .xml file. How can I solve it?

filepath='/Data/nlp/ESA/Wiki-ESA-master'
file_name='enwiki-latest-pages-articles.xml.bz2'
zipfile = bz2.BZ2File(file_name) # open the file
DEFAULT_FILENAME = zipfile.read() # get the decompressed data

error:

EOFError: compressed file ended before the logical end-of-stream was detected
sophros
  • 14,672
  • 11
  • 46
  • 75
parvaneh
  • 490
  • 2
  • 6
  • 16

1 Answers1

2

As the error says, the downloading process most likely ended prematurely and you have a truncated file. Try downloading again.

Another reason may be a corrupted data on your disk. Downloading again may help with this too.

sophros
  • 14,672
  • 11
  • 46
  • 75
  • I downloaded the new version of wiki dump with this name "enwiki-20180901-pages-articles-multistream.xml.bz2" and it works well. So, you might be right that the problem was from the file. – parvaneh Dec 05 '18 at 21:25