1

I am trying to decompress a WARC ZST file that I downloaded from here: https://archive.org/details/archiveteam_yahooanswers_20210422220546_c4fac540

I tried the command zstd -d yahooanswers_20210422220546_c4fac540.1619026173.megawarc.warc.zst but I got this error: 73.megawarc.warc.zst : 0 MB... 73.megawarc.warc.zst : Decoding error (36) : Dictionary mismatch How can I find the said dictionary or are there any alternatives to this?

Arundhati
  • 11
  • 2

1 Answers1

1

The dictionary can be found inside the first skippable frame of the warc.

To extract the dictionary OrIdow6 write this to extract it: https://transfer.notkiska.pw/inline/TXlRo/xtract.py

You'll require python3, zstd and zstandard

python ./xtract.py /path/to/megawarc.warc.zst > dict

Then you can

zstd -d /path/to/megawarc.warc.zst -D dict

And you should be able to view the megawarc with your standard warc viewing tools

Jimmy
  • 11
  • 2
  • I am almost certain that this only works with Linux and not Windows. File "xtract.py" seems especially Linux-focused. – Rublacava Aug 04 '21 at 04:28
  • Which version of the zstd and zstandard that this script is using? I got this error after run: Traceback (most recent call last): File "xtract.py", line 8, in from _zstd_cffi import ffi, lib ModuleNotFoundError: No module named '_zstd_cffi' – NSK Dec 17 '21 at 06:57
  • On Windows, try using something like msys2 msys or msys2 ucrt64, and make sure the corresponding versions of these two packages are installed first: **ucrt64/mingw-w64-ucrt-x86_64-python-zstandard** *Python bindings to the Zstandard (zstd) compression library (mingw-w64)* and **ucrt64/mingw-w64-ucrt-x86_64-zstd** (replace ucrt64 with mingw64, mingw32 etc. depending on the version you chose.) – AJM May 19 '23 at 11:54