What is -1 & -2 shown by IntelliJ Debugger in an UTF8 string and how to remove it?

Question

When parsing my string from a file on Windows 10 I kinda have two characters that are not removable by whitespaces trims and such.

Here is evidence of the culprit.

This somewhat screws up my regex ^(\w+) because it happens that there is a whitespace in it. When I copy the value of the string (screenshot) into RegExr for example I see there is a whitespace added - and that is why my regex will net work.

I already googled for -1 -2 in UTF-8 string but was not able to find anything and therefore am super confused with that.

These screenshots are useless unless you're okay with a wild goose chase. Copy+paste the problematic string into your question. — MonkeyZeus, Dec 07 '20 at 19:03

score 2 · Accepted Answer · answered Dec 07 '20 at 19:09

2

Your debugger is being silly for showing them as -1 and -2 respectively, but it's clear enough that you're dealing with the UTF-16 BOM (not UTF-8 as you claim in the question, that one is a 3-byte marker that's completely different).

Feel free to check for their presence and remove them if you encounter them at the beginning of a file, though ideally you should save your file without the BOM in the first place.

answered Dec 07 '20 at 19:09

Blindy

65,249
10
91
131

So it's either `254 255` or `255 254` to check for I assume? – xetra11 Dec 07 '20 at 19:15
Yeah, depending on your system's endianness. – Blindy Dec 07 '20 at 19:15

What is -1 & -2 shown by IntelliJ Debugger in an UTF8 string and how to remove it?

1 Answers1