Streaming decompression of data in a vector

Question

I need to do the following procedure.

Compress an input text into an array.
Split the compressed output into multiple pieces with approx. the same length and store them in a vector.
Decompress using streaming decompression.

Is it possible to do that? Consider that in my case, the size of each block is fixed and independent of the compression scheme.

In this example here, the decompression function returns the size of the next block, I wonder if that is somewhat related to the compression scheme, i.e. you cannot randomly take a sub-array contained in the full compressed array and decompress it.

I need to use zstd, no other compression algorithms.

Here is what I tried so far.

//std::vector<std::string_view> _content_compressed passed as parameter
ZSTD_DStream* const dstream = ZSTD_createDStream();
ZSTD_initDStream(dstream);
std::vector<char*> vec;
for (auto el : _content_compressed)
    {
        auto ee = el.data();
        char* decompressed = new char[1000];
        ZSTD_inBuffer input = { el.data(), el.size(), 0 };
        ZSTD_outBuffer output = { decompressed, _decompressed_size, 0 };
        std::size_t toRead = ZSTD_decompressStream(dstream, &output, &input);
        vec.push_back(decompressed);
}

The problem is that decompressed doesn't contain the decompressed value at the end.

That seems like something that any compression algorithm can do. — user253751, May 18 '22 at 14:44
Yes, it is possible. What particular problem do you have with your approach? Or just looking for compression algorithms overview [like here](https://stackoverflow.com/questions/16469410/data-compression-algorithms)? — pptaszni, May 18 '22 at 14:57
One error in current code is that it never increments the offset inside `decompressed`. As a consequence, each subsequent block that you decompress is written to the beginning of `decompressed`, overwriting the previously decompressed data, rather than be appended. — Konrad Rudolph, May 18 '22 at 15:15
@KonradRudolph in the linked example it seems that it doesn't increments it either — Alberto Tiraboschi, May 18 '22 at 15:20
@AlbertoTiraboschi The linked example immediately writes each decompressed chunk to an output stream. Its `buffOut` refers to a temporary buffer, unlike your `decompressed` array. — Konrad Rudolph, May 18 '22 at 15:29
@KonradRudolph, nope, changed the code but still decompressed seems untouched — Alberto Tiraboschi, May 18 '22 at 15:51
@AlbertoTiraboschi Hard to say how you have determined that, since you don’t show the relevant code. But the change you’ve made to your code is almost certainly not what you *actually* want. Instead, you want to keep `decompressed` as it was before, but change the position that you pass to the initialiser of `output`, and increment it by `toRead` in every loop. Anyway, there are probably other errors (for example, you never call `ZSTD_initDStream`) but I’ve never used the zstd API so I don’t know what else might be missing. But the error I previously pointed out is definitely relevant. — Konrad Rudolph, May 18 '22 at 16:23
@KonradRudolph, yes I agree that output should be incremented each loop, but I just wanted to do a quick test. Added ZSTD_initDStream() but still no changes. — Alberto Tiraboschi, May 18 '22 at 16:43

Streaming decompression of data in a vector

0 Answers0