1

My javascript frontend is sending the base64 encoded string:

data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAM8AAADkCAIAAACwiOf9AAAAA3NCSVQICAjb4U/gAAAgAElEQVR4nO...`

I need to get just the base64 data, that means the iVBORw0KGgoAAAANSUhEUgAAAM8AAADkCAIAAACwiOf9AAAAA3NCSVQICAjb4U/gAAAgAElEQVR4nO.... Basically the data:image/png;base64, needs to be removed. Is there some standard python library that I can use to perform this operation, or must I roll my own regexes?

The base64 library just offers support for encoding/decoding, which is not what I need: I want to keep the base64 encoded data, just without the prefix.

blueFast
  • 41,341
  • 63
  • 198
  • 344
  • Seems to me that it would be straight forward to just remove it yourself. – Hans Dec 11 '13 at 11:13
  • It would be straight forward if the data is simple enough. What format is that? Is it standard? Will it always follow that pattern? Can I just split on the comma and be sure that it will always work? That, and more, I hoped to be taken care of by a library, which maybe exists, maybe not. – blueFast Dec 11 '13 at 11:16
  • I don't have much experience with base64 encoded images, but I think they always start with "data:image/png;base64,". https://developer.mozilla.org/en/docs/data_URIs – Hans Dec 11 '13 at 11:21
  • @Hans: Yes! That was the keyword I was looking for! data URIs or RFC2397. There is a library for that! https://pypi.python.org/pypi/rfc2397 (but not sure if it is overkill. I hoped to have this built in python). If you put your comment as answer, I accept! – blueFast Dec 11 '13 at 11:25
  • Can the library remove the stuff in the front of the string? It seems that it's for encoding images according to the data URL scheme. – Hans Dec 11 '13 at 11:33
  • Indeed, that library is not what I need. But anyway, now that I know what we are talking about, I can look for it. Thanks! – blueFast Dec 11 '13 at 11:36
  • No problem and good luck! I'd consider just taking a substring from the end of ";base64," to the end of the string, though. – Hans Dec 11 '13 at 11:39

1 Answers1

0

For reference for others, I have prepared a small library for this:

_compiled  = False
_compiled1 = None
_compiled2 = None
def compile_it():
    global _compiled, _compiled1, _compiled2
    if not _compiled:
        regex1 = r'^data:(?P<mediatype>[^\;]*);base64,(?P<data>.*)'
        regex2 = r'^data:(?P<mediatype>[^\;]*),(?P<data>.*)'
        _compiled = True
        _compiled1 = re.compile(regex1)
        _compiled2 = re.compile(regex2)

def clean_data_uri(data_in):
    # Clean base64 data coming from the frontend
    #     data:image/png;base64,iVBORw0KGgoAAAA... -> iVBORw0KGgoAAAA...
    # As specified in RFC 2397
    # http://stackoverflow.com/q/20517429/647991
    # http://en.wikipedia.org/wiki/Data_URI_scheme
    # http://tools.ietf.org/html/rfc2397
    #   Format is : data:[<mediatype>][;base64],<data>
    compile_it()
    try:
        m         = _compiled1.match(data_in)
        success   = True
        base64    = True
        mediatype = m.group('mediatype')
        data      = m.group('data')
    except:
        try:
            m         = _compiled2.match(data_in)
            success   = True
            base64    = False
            mediatype = m.group('mediatype')
            data      = m.group('data')
        except Exception, e:
            log.warning('clean_data_uri > Not possible to parse data_in > %s', e)
            success   = False
            base64    = False
            mediatype = None
            data      = None
    if not success:
        log.error('clean_data_uri > Problems splitting data')
    return success, mediatype, base64, data
blueFast
  • 41,341
  • 63
  • 198
  • 344
  • though this is older post but facing same issue and what I have found is that base64 images will be always in the format "data:image/;base64,". So even if we split it on "," it should work always. Then why is a need to have above regex processing when it can be done in one line? – Rahul Shelke May 05 '14 at 12:22
  • 2
    the code is not very pythonic (match returns `None` for unsuccessful, which will cause AttributeError on `m.group` which is then suppressed by `except` block...; and the `success=False` should be an exception instead) – Antti Haapala -- Слава Україні Feb 18 '16 at 12:59