ド(U+30C9) vs ド(U+30C8U+3099)
Fyi, the situation is
- a user uploaded a file with name containing
ド(U+30C8U+3099)
to AWS s3 from a web app. - Then, the website sent a POST request containing the file name without url encoding to AWS lambda function for further processing using Python. The name when arrived in Lambda became
ド(U+30C9)
. Python then failed to access the file stored in s3 because of the difference in unicode.
I think the solution would be to do url encoding on frontend before sending the request and do url decoding using urllib.parse.unquote
to have the same unicode.
My questions are
would url encoding solve that issue? I can't reproduce the same issue probably because I am on a different OS from the user's OS.
How exactly did it happen since both requests (uploading to s3 and sending the 2nd request to lambda) happened on the user's machine?
Thank you.