-3

I'm looking for a compression algorithm that:

  • must be loseless
  • must have very high compression ratio
  • must be supported in browser via JavaScript libs or natively
  • shouldn't be fast.

Goals:

  1. to compress dense array of 8 million double-precision floats. There only 256 unique values. Values are normally distributed. (primary use-case)
  2. the same as before but for sparse arrays (contains a lot of 0 values)

It' OK for me to use 2 different algorithms for these use-cases.

I've found Google's Brotli algorithm. But I'm not sure if it is the best.

2 Answers2

3

Coding is pretty much a problem solved: your main task will be modelling (starting with float number and lossless).
[primarily dense arrays] of 256 unique float numbers doesn't sound promising: depending on range, exponent representation may be the only source of exploitable redundancy.
sparse array does sound promising, 16×16 sparse matrix even more so. The more you know about your data, the more you can help compressors - "mainly diagonal matrix", anyone?

"General purpose data compressors" exploit self-similarity:
To get an idea where your data has such, use "the usual suspects" on whatever "machine representation" you chose and on a generic unicode representation.
The latter allows you to use no more resolution than required.

greybeard
  • 2,249
  • 8
  • 30
  • 66
  • Thanks. I have just understood that my pattern is that each float number can just be mapped to byte. After that some "general purpose" can be run. – Ruslan Gunawardana Nov 11 '17 at 06:20
0

I've a lot of float numbers. But because there are only 256 unique values I can encode each number as 1 byte. It gives a huge compression ratio. After that I can run some general purpose algorithm for further data compression. I've checked several popular algorithms: gzip, Brotli, bzip2, lzma, Zstandard.

I've found that 2 options suit my needs:

  • bzip2
  • Brotli

bzip2:

  • compresses well even if I don't convert the floats to unsigned bytes.
  • but requires JS library in browser

Brotli:

  • compresses well only if I manually map all floats to unsigned bytes before
  • supported natively by nearly all modern browsers
  • 2
    This seems to call for a rewording in the question: *compress big (huge? large? If possible, give a distribution/range of size) arrays of float numbers from a domain 256 distinct values known to the decompressor 1) primary use-case: dense arrays 2) minor use-case: sparse arrays*. – greybeard Nov 11 '17 at 07:38
  • @greybeard I've found that values are normally distributed. – Ruslan Gunawardana Nov 13 '17 at 15:08
  • *`values`* `are normally distributed` - great, but lacking *mean* and *variance*. Then, I asked about the *array size*, not the values. – greybeard Nov 13 '17 at 17:58