Im working with images using Pillow. I'm trying to capture the results in a bytes string directly so that I can save to a database without first saving to the local filesystem. The code works for many images but fails on one of them.
I've tried logging as much information as I can about the bytes string, but I cannot see anything obvious that is wrong the the bytes. One difference is that the image that it is failing on is only white pixels. I'm not sure why that would make it un-encodable though.
I've looked at: Python / Django fails at decoding file encoded as base64 by javascript But the answer there didn't help.
What should I be looking for as a cause of the problem? What information should I log? Any idea what might cause the problem?
Code below:
from PIL import Image
import numpy as np
import io
import base64
def remove_lines(data):
im = Image.open(io.BytesIO(data)).convert("L")
array = np.asarray(im)
<do stuff to array>
im = Image.fromarray(array, 'L')
print(f"\narray shape: {array.shape}\n")
print(f"type of array: {type(array)}\n")
image_bytes = io.BytesIO()
im.save(image_bytes, format='PNG')
image_bytes = image_bytes.getvalue()
print(f"start of image_bytes:\n{image_bytes[:30]}\n")
print(f"end of image_bytes:\n{image_bytes[-30:]}\n")
# test the encoding by trying to decode
string = base64.b64decode(image_bytes)
print("success\n")
Output:
page number: 0
2019-01-15 21:51:50,930 STREAM b'IHDR' 16 13
2019-01-15 21:51:50,930 STREAM b'pHYs' 41 9
2019-01-15 21:51:50,930 STREAM b'IDAT' 62 8192
array shape: (1651, 1275)
type of array: <class 'numpy.ndarray'>
start of image_bytes:
b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x04\xfb\x00\x00\x06s\x08\x00\x00\x00\x00\xa8'
end of image_bytes:
b'OD,R\xfbD\xc4\xa2\xff\x01\xf6)\xc6\x7fj\xbfU\xe8\x00\x00\x00\x00IEND\xaeB`\x82'
success
page number: 1
2019-01-15 21:52:09,142 STREAM b'IHDR' 16 13
2019-01-15 21:52:09,143 STREAM b'pHYs' 41 9
2019-01-15 21:52:09,143 STREAM b'IDAT' 62 8192
array shape: (1754, 1241)
type of array: <class 'numpy.ndarray'>
start of image_bytes:
b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x04\xd9\x00\x00\x06\xda\x08\x00\x00\x00\x00\xdd'
end of image_bytes:
b'!d>\xa8l\x84\x90\xf9\xf8?\xf1\xab\x9a-\xf3\xf7\x94\xbd\x00\x00\x00\x00IEND\xaeB`\x82'
success
page number: 2
2019-01-15 21:52:20,700 STREAM b'IHDR' 16 13
2019-01-15 21:52:20,700 STREAM b'pHYs' 41 9
2019-01-15 21:52:20,701 STREAM b'IDAT' 62 8192
array shape: (1754, 1241)
type of array: <class 'numpy.ndarray'>
start of image_bytes:
b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x04\xd9\x00\x00\x06\xda\x08\x00\x00\x00\x00\xdd'
end of image_bytes:
b'\x03z\x9c\r\xe8q6\xa0\xe7\x02Dl\x0e\xb3\x8f\xebQ,\x00\x00\x00\x00IEND\xaeB`\x82'
Traceback (most recent call last):
File "aex_image_cleaner.py", line 54, in <module>
main()
File "aex_image_cleaner.py", line 34, in main
data = cleaner.remove_lines(base64.b64decode(image_data))
File "cleaner.pyx", line 39, in cleaner.remove_lines
string = base64.b64decode(image_bytes)
File "/home/neil/anaconda3/lib/python3.7/base64.py", line 87, in b64decode
return binascii.a2b_base64(s)
binascii.Error: Invalid base64-encoded string: length cannot be 1 more than a multiple of 4