7

I'm working on a program that takes a string, turns each character of the string into a color, then draws the colors left-to-right, top-down across an image. The image can then be decoded using the same program to get the original message back. As an example, here's clojure.core, encoded as an image:

Clojure.core encoded as an image

I wrote this just as a toy, but I noticed an interesting property of the images it produces: they're smaller than the original messages were as text. For the clojure.core, it's 259kb as text, but only 88.9kb as an image (above) (both values are "size on disk"). To ensure data wasn't being lost, I decoded the image, and got the original message back.

How is this possible? I'd think the image (png format) would have headers and other extra information that would inflate the size.

The entire clojure.core contains 265486 characters (according to Notepad++), which means that each character is basically taking up a byte.

From working with the BufferedImage class (Java), it appears as though colors are stored as 4-byte integers, so shouldn't each pixel require ~4x the memory?

Here's how it's encoded:

  1. The first character of the string is popped off

  2. It's translated into a color by getting it's ASCII value, multiplying it by a large number (so it covers the range of possible colors better), then that number is converted into a 3 digit, base 256 number ([123 100 200]).

  3. Each digit is treated as red, green and blue channels, which are given to BufferedImage's setRGB method.

  4. The position indicator is advanced, the next character is popped, and the process repeats until the entire message is encoded.

The algorithm is a little convoluted right now. @Thumbnail suggested a far better way on Code Review, but I haven't implemented it yet. Since the results are the same though, that shouldn't make a difference for the question.

Carcigenicate
  • 43,494
  • 9
  • 68
  • 117
  • 2
    Even though the answer was somewhat obvious, I still enjoyed reading about your findings. It is always fun coming across stuff like that. – Luke Joshua Park Jan 23 '17 at 06:26

1 Answers1

7

Portable Network Graphics (PNG) is a raster graphics file format that supports lossless data compression (from https://en.wikipedia.org/wiki/Portable_Network_Graphics), iow. the image data is compressed when stored as a .png file.

thebjorn
  • 26,297
  • 11
  • 96
  • 138
  • Doh. Well that's obvious in retrospect. Thanks. – Carcigenicate Jan 22 '17 at 22:40
  • It's probably worth mentioning that PNG uses the Zlib/Deflate compression (and that you would likely get better results using Deflate directly on the text file). – Harald K Jan 23 '17 at 09:13
  • @haraldK that would be my expectation as well. In addition to deflate, png does a pre-pass that helps the compression of "real" images (where a pixel, statistically, looks very much like its neighboring pixels). You could probably take advantage of this by choosing colors that are close to each other (thus getting better png compression). Unrelatedly, it's probably feasible to make smaller dimension images by encoding more than one character per pixel (truecolor + alpha png images use 64 bits per pixel) - this will most likely not compress as well though. – thebjorn Jan 23 '17 at 09:31