0

Im working with images using Pillow. I'm trying to capture the results in a bytes string directly so that I can save to a database without first saving to the local filesystem. The code works for many images but fails on one of them.

I've tried logging as much information as I can about the bytes string, but I cannot see anything obvious that is wrong the the bytes. One difference is that the image that it is failing on is only white pixels. I'm not sure why that would make it un-encodable though.

I've looked at: Python / Django fails at decoding file encoded as base64 by javascript But the answer there didn't help.

What should I be looking for as a cause of the problem? What information should I log? Any idea what might cause the problem?

Code below:

from PIL import Image
import numpy as np
import io
import base64

def remove_lines(data):
    im = Image.open(io.BytesIO(data)).convert("L")

    array = np.asarray(im)
    <do stuff to array>

    im = Image.fromarray(array, 'L')
    print(f"\narray shape: {array.shape}\n")
    print(f"type of array: {type(array)}\n")

    image_bytes = io.BytesIO()
    im.save(image_bytes, format='PNG')
    image_bytes = image_bytes.getvalue()

    print(f"start of image_bytes:\n{image_bytes[:30]}\n")
    print(f"end of image_bytes:\n{image_bytes[-30:]}\n")

    # test the encoding by trying to decode
    string = base64.b64decode(image_bytes)
    print("success\n")

Output:

page number: 0
2019-01-15 21:51:50,930 STREAM b'IHDR' 16 13
2019-01-15 21:51:50,930 STREAM b'pHYs' 41 9
2019-01-15 21:51:50,930 STREAM b'IDAT' 62 8192

array shape: (1651, 1275)

type of array: <class 'numpy.ndarray'>

start of image_bytes:
b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x04\xfb\x00\x00\x06s\x08\x00\x00\x00\x00\xa8'

end of image_bytes:
b'OD,R\xfbD\xc4\xa2\xff\x01\xf6)\xc6\x7fj\xbfU\xe8\x00\x00\x00\x00IEND\xaeB`\x82'

success

page number: 1
2019-01-15 21:52:09,142 STREAM b'IHDR' 16 13
2019-01-15 21:52:09,143 STREAM b'pHYs' 41 9
2019-01-15 21:52:09,143 STREAM b'IDAT' 62 8192

array shape: (1754, 1241)

type of array: <class 'numpy.ndarray'>

start of image_bytes:
b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x04\xd9\x00\x00\x06\xda\x08\x00\x00\x00\x00\xdd'

end of image_bytes:
b'!d>\xa8l\x84\x90\xf9\xf8?\xf1\xab\x9a-\xf3\xf7\x94\xbd\x00\x00\x00\x00IEND\xaeB`\x82'

success

page number: 2
2019-01-15 21:52:20,700 STREAM b'IHDR' 16 13
2019-01-15 21:52:20,700 STREAM b'pHYs' 41 9
2019-01-15 21:52:20,701 STREAM b'IDAT' 62 8192

array shape: (1754, 1241)

type of array: <class 'numpy.ndarray'>

start of image_bytes:
b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x04\xd9\x00\x00\x06\xda\x08\x00\x00\x00\x00\xdd'

end of image_bytes:
b'\x03z\x9c\r\xe8q6\xa0\xe7\x02Dl\x0e\xb3\x8f\xebQ,\x00\x00\x00\x00IEND\xaeB`\x82'

Traceback (most recent call last):
  File "aex_image_cleaner.py", line 54, in <module>
    main()
  File "aex_image_cleaner.py", line 34, in main
    data = cleaner.remove_lines(base64.b64decode(image_data))
  File "cleaner.pyx", line 39, in cleaner.remove_lines
    string = base64.b64decode(image_bytes)
  File "/home/neil/anaconda3/lib/python3.7/base64.py", line 87, in b64decode
    return binascii.a2b_base64(s)
binascii.Error: Invalid base64-encoded string: length cannot be 1 more than a multiple of 4
martineau
  • 119,623
  • 25
  • 170
  • 301
Neil
  • 3,020
  • 4
  • 25
  • 48
  • 2
    Your string simply is not in Base64 format, what made you think that it was? Passing garbage to `b64decode()` may or may not produce an error, since there's no internal consistency check in the format. – jasonharper Jan 15 '19 at 20:12
  • 3
    I think it's because the data you get with `image_bytes = image_bytes.getvalue()` following the `im.save(image_bytes, format='PNG')` is not base64 encoded—it's the binary data representing the data formatted as a PNG file—so the decoding of it is fails. – martineau Jan 15 '19 at 20:14
  • Ah ohkay. The reason that I assumed it was is that b64 encoding is what is usually used when saving to a DB. I suppose that was silly in hindsight though. Thanks. – Neil Jan 15 '19 at 20:16

0 Answers0