4

I'm using python's base64 module and I get a string that can be encoded or not encoded. I would like to do something like:

if isEncoded(s):
   output = base64.decodestring(s)
else:
   output = s

ideas?

Guy
  • 14,178
  • 27
  • 67
  • 88

4 Answers4

11

In general, it's impossible; if you receive string 'MjMj', for example, how could you possibly know whether it's already decoded and needs to be used as is, or decoded into '23#'?

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • 2
    exactly: you can only test that there aren't forbidden chars and that the length is divisible by four. – giorgian Oct 07 '09 at 16:09
  • 2
    it's also worth noting that because it's impossible, attempting to decode it "when necessary" can be used as an attack vector for XSS and similar attacks by crafting seemingly-encoded data that your system does bad things with after its decoded. – rmeador Oct 07 '09 at 16:54
5

You could just try it, and see what happens:

import base64

def decode_if_necessary(s):
    try:
         return base64.decodestring(s)
    except:
         return s

But you have to ask yourself: what if the original message was in fact a syntactically valid base64 string, but not meant to be one? Then "decoding" it will succeed, but the result is not the required output. So I have to ask: is this really what you want?

Edit: Note that decodestring is deprecated.

Stephan202
  • 59,965
  • 13
  • 127
  • 133
  • you're saying that if s isn't decoded, decodestring() raises an exception? – Guy Oct 07 '09 at 16:05
  • He's saying that the chances of a string you want to use being validly base64 encoded are slim, and when you call `decodestring` on an invalidly base64 encoded string, `decodestring` raises an exception. This looks to me like a reasonable, simple approach. +1 – Dominic Rodger Oct 07 '09 at 16:08
  • I actually tried something like that and when the string, that was not decoded, did not throw an exception, I got gibrish. – Guy Oct 07 '09 at 16:08
  • Then the input you supplied was in fact a valid base64 encoding. This demonstrates the issue at hand. – Stephan202 Oct 07 '09 at 16:09
5

You could check to see if a string may be base64 encoded. In general, the function can predict with 75%+ accuracy is the data is encoded.

def isBase64(s):
    return (len(s) % 4 == 0) and re.match('^[A-Za-z0-9+/]+[=]{0,2}$', s)
brianegge
  • 29,240
  • 13
  • 74
  • 99
0

You can use the argument validate=True, something like:

  try:
    # Convert the input string to bytes
    input_bytes = input_string.encode('utf-8')

    # Decode the Base64 encoded bytes
    decoded_bytes = base64.b64decode(input_bytes, validate=True)

    return decoded_bytes
except binascii.Error:
    print("Error: Invalid Base64 string")

validate=True argument in the base64.b64decode() function will enforce strict padding rules (an encoded string is always properly padded). If the input string is not valid, a binascii.Error exception is raised, which we catch and handle accordingly.

Ayushi Jain
  • 828
  • 8
  • 17