Given a string, how do I know if it needs decoding

Question

I'm using python's base64 module and I get a string that can be encoded or not encoded. I would like to do something like:

if isEncoded(s):
   output = base64.decodestring(s)
else:
   output = s

ideas?

score 11 · Accepted Answer · answered Oct 07 '09 at 16:07

11

In general, it's impossible; if you receive string 'MjMj', for example, how could you possibly know whether it's already decoded and needs to be used as is, or decoded into '23#'?

answered Oct 07 '09 at 16:07

Alex Martelli

854,459
170
1,222
1,395

2

exactly: you can only test that there aren't forbidden chars and that the length is divisible by four. – giorgian Oct 07 '09 at 16:09
2

it's also worth noting that because it's impossible, attempting to decode it "when necessary" can be used as an attack vector for XSS and similar attacks by crafting seemingly-encoded data that your system does bad things with after its decoded. – rmeador Oct 07 '09 at 16:54

score 5 · Answer 2 · answered Oct 07 '09 at 16:04

5

You could just try it, and see what happens:

import base64

def decode_if_necessary(s):
    try:
         return base64.decodestring(s)
    except:
         return s

But you have to ask yourself: what if the original message was in fact a syntactically valid base64 string, but not meant to be one? Then "decoding" it will succeed, but the result is not the required output. So I have to ask: is this really what you want?

Edit: Note that decodestring is deprecated.

answered Oct 07 '09 at 16:04

Stephan202

59,965
13
127
133

you're saying that if s isn't decoded, decodestring() raises an exception? – Guy Oct 07 '09 at 16:05
He's saying that the chances of a string you want to use being validly base64 encoded are slim, and when you call `decodestring` on an invalidly base64 encoded string, `decodestring` raises an exception. This looks to me like a reasonable, simple approach. +1 – Dominic Rodger Oct 07 '09 at 16:08
I actually tried something like that and when the string, that was not decoded, did not throw an exception, I got gibrish. – Guy Oct 07 '09 at 16:08
Then the input you supplied was in fact a valid base64 encoding. This demonstrates the issue at hand. – Stephan202 Oct 07 '09 at 16:09

score 5 · Answer 3 · answered Oct 19 '09 at 02:26

5

You could check to see if a string may be base64 encoded. In general, the function can predict with 75%+ accuracy is the data is encoded.

def isBase64(s):
    return (len(s) % 4 == 0) and re.match('^[A-Za-z0-9+/]+[=]{0,2}$', s)

answered Oct 19 '09 at 02:26

brianegge

29,240
13
74
99

1

This function was pretty much useless for me, avoid. – SleepyCal Feb 13 '14 at 22:19
Elaborate please? – htellez Jul 14 '21 at 18:16

score 0 · Answer 4 · answered Jun 23 '23 at 07:20

You can use the argument validate=True, something like:

  try:
    # Convert the input string to bytes
    input_bytes = input_string.encode('utf-8')

    # Decode the Base64 encoded bytes
    decoded_bytes = base64.b64decode(input_bytes, validate=True)

    return decoded_bytes
except binascii.Error:
    print("Error: Invalid Base64 string")

validate=True argument in the base64.b64decode() function will enforce strict padding rules (an encoded string is always properly padded). If the input string is not valid, a binascii.Error exception is raised, which we catch and handle accordingly.

Given a string, how do I know if it needs decoding

4 Answers4