0

Why is %e9 or %fd invalid string to decode using the decodeURIComponent from Javascript?

These characters appear in the middle of a string and I can't understand where the problem is. They are valid hexadecimal characters.

Full string (this is part of a string sent by client app to server and that was being blocked by modsec):

%61%e9%3d%36%7f%00%00%01%00%00%43%fd%a1%5a%00%00%00%43

Sample to decode:

decodeURIComponent("%61%e9%3d%36%7f%00%00%01%00%00%43%fd%a1%5a%00%00%00%43")

Error:

VM222:1 Uncaught URIError: URI malformed
    at decodeURIComponent (<anonymous>)
    at <anonymous>:1:1

I am using these two functions to encode base64 and decode from base64 (from here:Mozilla):

function c64(t) {
        return btoa(encodeURIComponent(t).replace(/%([0-9A-F]{2})/g,
                (match, p1) => {
            return String.fromCharCode('0x' + p1);
        }));
    }

function d64(t) {
        return decodeURIComponent(atob(t).split('').map(function (c) {
            return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
        }).join(''));
    }

The original string is in base64:

d64("Yek9Nn8AAAEAAEP9oVoAAABDYek9Nn8AAAEAAEP9oVoAAABD")

returns:

...js:1 Uncaught URIError: URI malformed
    at decodeURIComponent (<anonymous>)
  • 3
    I think URL encoding is supposed to use UTF-8 encoding, and those may not be valid UTF-8 values. – Barmar Jan 20 '22 at 22:16
  • 1
    Please post the whole string as well as the code that generated it. – Bergi Jan 20 '22 at 22:18
  • Right. The original RFC only allowed ASCII characters (0x00 to 0x7F). There is a suggested extension to allow UTF-8 encoding, and that's commonly used. Arbitrary binary data is not an option. – Tim Roberts Jan 20 '22 at 22:25
  • Thanks for the update! "*the original string is in base64*" - what was the string that you passed to `c64` to generate that string? – Bergi Jan 21 '22 at 09:35
  • "*was being blocked by modsec*" - maybe modsec is right, this is actually an invalid request. Are you certain it originated within your app? – Bergi Jan 21 '22 at 18:04

1 Answers1

0

This is because the unicode representation of that character in hexadecimal encoding is not "%e9" or "%E9".

Start by typing in console: "\u00e9" or "\u00E9"

which is % replaced by "\u00" in your example. You will get:

'é'

You can verify this by running:

escape('é') //"%E9".

Now run

encodeURIComponent('é')

and you will get "%C3%A9" not "%E9". This is because encodeURIComponent returns hex dump of bytes. If the character is 2 bytes you get %xx%yy, if 3 bytes you get %xx%yy%zz.

Try this with "€". First do:

escape("€")

, you will get '%u20AC' or same as "\u20AC".

To get the hex dump of its byte code run:

encodeURIComponent("€") and you will get '%E2%82%AC'.

This example from Wikipedia 'UTF-8' article explains in detail how '%E2%82%AC' is calculated. It is the hex dump of 11100010 10000010 10101100.

myf
  • 9,874
  • 2
  • 37
  • 49
ibrahim tanyalcin
  • 5,643
  • 3
  • 16
  • 22