Detect malicious code or text inside base64 dataURL image

Question

I have the following 3 "dataURL image", all of them returns the same image if you open them via "URL", but two of the below dataURL code has "PHP code" and "JavaScript code" embedded on last.

How can I remove those malicious codes from my base64 dataURL image coming from users I don't trust.

base64 dataURL image (safe):

data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAMCAgMCAgMDAwMEAwMEBQgFBQQEBQoHBwYIDAoMDAsKCwsNDhIQDQ4RDgsLEBYQERMUFRUVDA8XGBYUGBIUFRT/2wBDAQMEBAUEBQkFBQkUDQsNFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBT/wAARCAA2AFwDASIAAhEBAxEB/8QAFwABAQEBAAAAAAAAAAAAAAAAAAIBA//EACcQAAEDAwQBBAMBAAAAAAAAAAABAgMTUVIREhSRYQQjQWIxMkJy/8QAFwEBAQEBAAAAAAAAAAAAAAAAAAEHCP/EABgRAQEBAQEAAAAAAAAAAAAAAAABEUEx/9oADAMBAAIRAxEAPwCeNJZRxpLKRVddRVddTFeuvZ4pfTSYqpnHkwUlZHO/pTN7sl7IL48mCjjyYKRvdkvZqPci/lewOnGkso40llIquuoquuoF8aSymO9NIn8qpNV11MWRy/KgVx5MFC+nmX8b2/5doRvdkvY3uyXsDvugxUboMVI4z7DjPsXpPFufBi4nWDF3ZKwPT41FGTFxBWsGLuxrDg7smjJi4yjJgoHXdBio3QYqRxn2HGfYC90GKhzoMV7I4z7BfTPRP1UDdYMXdjWDF3ZNGTFwoyYuAVn5KKz8lOm6GyjdDZSnHKq9flTKzrqdVWFfhTPY+xFc6zrqKz8lOnsfYez9gJrPyUVn5KdN0NlG6Gygc6z8lMdM9U/ZTruhspirCuQHKs66is66nT2PsPY+wHOn5FPyACSMVuia6mABcgbtABkbT8in5ABkKfkxW6eQAZGAAGR//9k=

base64 dataURL 2 image (PHP code injected):

data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAMCAgMCAgMDAwMEAwMEBQgFBQQEBQoHBwYIDAoMDAsKCwsNDhIQDQ4RDgsLEBYQERMUFRUVDA8XGBYUGBIUFRT/2wBDAQMEBAUEBQkFBQkUDQsNFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBT/wAARCAA2AFwDASIAAhEBAxEB/8QAFwABAQEBAAAAAAAAAAAAAAAAAAIBA//EACcQAAEDAwQBBAMBAAAAAAAAAAABAgMTUVIREhSRYQQjQWIxMkJy/8QAFwEBAQEBAAAAAAAAAAAAAAAAAAEHCP/EABgRAQEBAQEAAAAAAAAAAAAAAAABEUEx/9oADAMBAAIRAxEAPwCeNJZRxpLKRVddRVddTFeuvZ4pfTSYqpnHkwUlZHO/pTN7sl7IL48mCjjyYKRvdkvZqPci/lewOnGkso40llIquuoquuoF8aSymO9NIn8qpNV11MWRy/KgVx5MFC+nmX8b2/5doRvdkvY3uyXsDvugxUboMVI4z7DjPsXpPFufBi4nWDF3ZKwPT41FGTFxBWsGLuxrDg7smjJi4yjJgoHXdBio3QYqRxn2HGfYC90GKhzoMV7I4z7BfTPRP1UDdYMXdjWDF3ZNGTFwoyYuAVn5KKz8lOm6GyjdDZSnHKq9flTKzrqdVWFfhTPY+xFc6zrqKz8lOnsfYez9gJrPyUVn5KdN0NlG6Gygc6z8lMdM9U/ZTruhspirCuQHKs66is66nT2PsPY+wHOn5FPyACSMVuia6mABcgbtABkbT8in5ABkKfkxW6eQAZGAAGR//9k8P3BocCBlY2hvICJIZWxsbyBXb3JsZCI7ID8+Cg==

base64 dataURL 3 image (Javascript code injected):

data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAMCAgMCAgMDAwMEAwMEBQgFBQQEBQoHBwYIDAoMDAsKCwsNDhIQDQ4RDgsLEBYQERMUFRUVDA8XGBYUGBIUFRT/2wBDAQMEBAUEBQkFBQkUDQsNFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBT/wAARCAA2AFwDASIAAhEBAxEB/8QAFwABAQEBAAAAAAAAAAAAAAAAAAIBA//EACcQAAEDAwQBBAMBAAAAAAAAAAABAgMTUVIREhSRYQQjQWIxMkJy/8QAFwEBAQEBAAAAAAAAAAAAAAAAAAEHCP/EABgRAQEBAQEAAAAAAAAAAAAAAAABEUEx/9oADAMBAAIRAxEAPwCeNJZRxpLKRVddRVddTFeuvZ4pfTSYqpnHkwUlZHO/pTN7sl7IL48mCjjyYKRvdkvZqPci/lewOnGkso40llIquuoquuoF8aSymO9NIn8qpNV11MWRy/KgVx5MFC+nmX8b2/5doRvdkvY3uyXsDvugxUboMVI4z7DjPsXpPFufBi4nWDF3ZKwPT41FGTFxBWsGLuxrDg7smjJi4yjJgoHXdBio3QYqRxn2HGfYC90GKhzoMV7I4z7BfTPRP1UDdYMXdjWDF3ZNGTFwoyYuAVn5KKz8lOm6GyjdDZSnHKq9flTKzrqdVWFfhTPY+xFc6zrqKz8lOnsfYez9gJrPyUVn5KdN0NlG6Gygc6z8lMdM9U/ZTruhspirCuQHKs66is66nT2PsPY+wHOn5FPyACSMVuia6mABcgbtABkbT8in5ABkKfkxW6eQAZGAAGR//9k8c2NyaXB0PmFsZXJ0KCdoZWxsbycpOzwvc2NyaXB0Pgo=

You can see text code by "decoding online" using tools like these - https://www.base64decode.org/

I am allowing the user to upload the image to my server and I "convert image" to base64 dataURL image

From above all 3 base64 dataURL image, you can see all returns same image, but their base64 code is different due to embedded text code inside the image.

I am using Go in the backend to save the image. I am using the following HTML code to convert image to dataURL base64 text.

<input type='file' onchange="readURL(this);" />
<img id="blah" src="#" alt="your image" />
<script>
function readURL(input) {
  if (input.files && input.files[0]) {
    var reader = new FileReader();
    reader.onload = function (e) {
      document.getElementById("blah").src = e.target.result;
    };
    reader.readAsDataURL(input.files[0]);
  }
}
</script>

My concern is "text" that should not be inside the image, should not be there.

Above dataURL returns the same image, yet they have different base64 code due to extra data inside.

I want to fetch the actual image base64 code from above 2 malicious code.

Let's assume, User B uploaded image where I get "base64 dataURL 3" image, but I want base64 dataURL original image from user's uploaded image.

How this can be done?

Anything can be malicious in the right context. Just don’t try to execute your images as PHP or JavaScript and you’ll be fine. — Ry-, Aug 08 '19 at 21:25
I didn't mention "harmful", I mentioned "malicious code", Inside your home in your personal bed room, if giant elephant is sleeping, it is malicious. it is not suppose to be there. So is image, anything that should not be there, should not be there. — John Cargo, Aug 08 '19 at 21:27
Ok, then do this:- save dataURL 2 code where PHP code is inserted, and include it in your PHP code like - `` don't ask me why I want to do this, just assume, in some scenario, you need to do this, what are you going to do now to include image safely. — John Cargo, Aug 08 '19 at 21:34
No, why you want to do this is essential to answering the question. There's no logical reason to try to execute an image as PHP. So why do you think this is a problem? — Jonathan Hall, Aug 08 '19 at 21:34
because I am a mad programmer and I want to do this - my code my choice my server, everything is mine. Since this can be done, therefore I will do it, What are other possibilities for me to run safely, now? — John Cargo, Aug 08 '19 at 21:36
The answers about decoding and reencoding don’t guarantee safety. The real answer is to not do the incredibly stupid and never useful thing that is ` — Ry-, Aug 08 '19 at 23:11
@CeriseLimón - Thanks for the hint, I think it's better to redraw the image from the image. — John Cargo, Aug 09 '19 at 00:47
@Ry- can you share - why do you think so? drawing the image from the image doesn't guarantee safety, why and how? — John Cargo, Aug 09 '19 at 00:48
@JohnCargo There may be valid images that will be interpreted as code by PHP. If an attacker knows how the images are re-encoded, it may be possible for an attacker to craft a source image that will reencode to an image containing malicious code. I deleted my previous comment because it's not a good suggestion. — Charlie Tumahai, Aug 09 '19 at 01:07
ok, so I think, it's better to use `convert -strip img.jpg img2.jpg` to remove malicious code. — John Cargo, Aug 09 '19 at 01:23
Voting is intentionally anonymous here. You've also gotten _many_ comments which could explain downvotes. Your reaction has been to attack. — Jonathan Hall, Aug 09 '19 at 13:12
Responding to the substance of your comment: If my goal is to waste space by bloating images, I won't be doing it with PHP or Javascript comments. Even if I did, that wouldn't be dangerous--it would only be a waste of space. — Jonathan Hall, Aug 09 '19 at 13:12
FWIW, I disagree that we need to know why JohnCargo wants to remove the code. He doesn't want it there - that's good enough. Also, yes downvoting is anonymous, but without adding a comment the downvote is not helpful. Finally, in regards to the original question - I receive image uploads from untrustworthy sources as well, and I do something similar to what @wp78de suggested. I also don't make the final destination of the file available to the user who uploaded it, luckily that's not required in my case. — Mike Willis, Aug 09 '19 at 13:32
@MikeWillis, what @wp78de has suggested (commercial software) is beyond my capabilities. I was about to re-encode and do the image work but later found `convert -strip img.jpg img2.jpg` is a safer choice. I am just waiting if someone else has better solution until 24 hour paas before accepting answer. — John Cargo, Aug 09 '19 at 13:42
@MikeWillis: The wide variety of answers and partial answers in comments proves the question is not well focused, and that understanding the OP's goals is important. So far we've heard everything from "remove EXIF data" to don't "execute jpegs as code". Depending on the goal, either of these could be correct. But without knowing the goal, there's no possible way to know which is correct. Further, this looks a lot like an XY Problem, or worse: a completely imaged problem. Knowing the goal will also clear up that ambiguity. Regardless, the combative attitude is the more serious problem. — Jonathan Hall, Aug 09 '19 at 13:54

vgel · Accepted Answer · 2019-08-08T21:48:52.357

3

ImageMagick convert -strip <in> <out> will do it. It will also remove other extraneous data (EXIF, embedded thumbnails, etc.), so make sure that behavior is what you want.

$ xxd img.jpg | tail -n 3
00000280: 647f ffd9 3c73 6372 6970 743e 616c 6572  d...<script>aler
00000290: 7428 2768 656c 6c6f 2729 3b3c 2f73 6372  t('hello');</scr
000002a0: 6970 743e 0a                             ipt>.

$ convert -strip img.jpg img2.jpg

$ xxd img2.jpg | tail -n 3       
00000260: 383a 2ebd 4c00 32c8 1ba4 0064 6d3f 229f  8:..L.2....dm?".
00000270: 9001 90a7 e4c8 a1d3 eff9 0019 1800 0647  ...............G
00000280: ffd9

Regardless, if you don't try to execute the images, nothing will happen. But if nothing else, it's wasted space in your image files.

To do this from Go, use the Go ImageMagick bindings and call StripImage

edited Aug 08 '19 at 21:48

answered Aug 08 '19 at 21:31

vgel

3,225
1
21
35

1

... and if you do try to execute the images, you'll get errors, because the images themselves aren't valid JS or PHP. This whole question is about a non-problem. – Jonathan Hall Aug 08 '19 at 21:34
Well, stripping EXIF is a nice privacy boost for your users, if nothing else. And if a malicious script gets onto your server some other way, image files with embedded data could be used as a command-delivery mechanism... though it's more likely the attacker would just use a pastebin. – vgel Aug 08 '19 at 21:38
@vgel, can you explain more about "convert --strip", this will require to run "system level command" which is not available in my scenario. Is there anything that doesn't require to run system level command? – John Cargo Aug 08 '19 at 21:40
@JohnCargo yeah, you can use the Go bindings. I added info to the answer. – vgel Aug 08 '19 at 21:49
I think the security issue is not only coming from executing the images as JS or PHP, it is also about not to share/pass around images containing malicious code. If user A of your application uploads an image containing malicious code, then later on user B may be affected by using that image somewhere else. – Bochen Lin Jul 20 '23 at 03:29

score 2 · Answer 2 · edited Aug 09 '19 at 08:22

Yes, there is a world where 'Hacking with Pictures' (often called Stegosploits) is a thing. The industry approach here is to use Content Disarm & Reconstruction (CDR) software. Quoting from Wikipedia:

[CDR] is a computer security technology for removing potentially malicious code from files. Unlike malware analysis, CDR technology does not determine or detect malware's functionality but removes all file components that are not approved within the system's definitions and policies.

If this is mission critical to you, you probably want to look into some of the commercial solutions available (the article also lists a number of them, I cannot give a recommendation here).

For a home-grown solution, reencoding the image might be sufficient.

You may want to give Go's native image library a try, see also https://stackoverflow.com/a/12434107/8291949
There is an ImageMagick MagickWand API bindings for Go that has the mentioned strip feature.

Detect malicious code or text inside base64 dataURL image

2 Answers2