3

I'm currently developing theme for Textual IRC and I want to compare the "Topic is ..." messages to the topic displayed in the channels topic bar, to delete them if the are the same.

The topic that causes problems has both Umlaute and a URI in it and looks like the following:

++ Frische Austern ++ Nächste Sitzung: https:/some/uri/that/can/contain/umlaute" ++

When I print both the old topic and the new topic, they look exactly the same, down to the trailing and leading whitespaces (I used trim() to eliminate them).

The comparison is done with

if(oldTopic === newTopic){
    // do stuff
}

What I've tried so far

Typecheck

I used typeof to make sure both them are of the type string and not Object

Umlaut elimination

I used replace(/ä/g, 'ae') to eliminate the Umlaute

URL Elimination

I used replace(/\//g, '_') to get rid of the forward slashes I used escape() to escape non unicode characters

Unfortunately none of it worked. Again if I use console.log to show the two strings, they are exactly the same. I was expecting some Unicode stuff, that you can represent ä in different ways, but replacing it didn't work either.

I've tried but I reached my limit of my JavaScript knowledge. I have really no idea why it's not working. The code has worked on some other topics, that did neither involve any Umlaute nor an URL.

If any of you happens to know an answer I'd be very thankful.

Kind regards and thanks in advance!

JHolub
  • 290
  • 3
  • 15
  • 2
    We can't help you with a hypothetical we can't see. Fundamentally, the strings are not equal. If they were, `===` would result in `true`. So they aren't. You've already tried the first thing I'd've suggested (making sure they're both primitives, not objects). The second thing would be to double-check their lengths and loop through them and find out at what character they're different: `if (str1.length !== str2.length) { console.log("lengths are different"); for (var n = 0; n < str1.length) { if (str1[n] !== str2[n]) { console.log(n + " different: '" + str1[n] + "' !== '" + str2[n] + "'"); } }` – T.J. Crowder Sep 25 '16 at 16:11
  • Damn why didn't I think of that, suddenly I feel like a total beginner again :D So it is very strange. If I compare their length, they are not equal. It's resolved to `false`. However if I print their length, they are both `111` characters long. The different character is the whitespace following the `:` – JHolub Sep 25 '16 at 16:22
  • :-) Probably best to just go ahead and delete the question, now you know where they're different... – T.J. Crowder Sep 25 '16 at 16:26
  • Thank you very much for helping. Well I know where they are different, but not really why. It's both just a whitespace. And I don't know the general underlying problem. I might fix it for this special case, but topics can be rather arbitrary. – JHolub Sep 25 '16 at 16:36
  • There are multiple different kinds of whitespace, `"foo bar" !== "foo\tbar"` and `"foo bar" != "foo bar"` (note the second one has a non-breaking space). – T.J. Crowder Sep 25 '16 at 16:42
  • Oh I thought `\t` was actually a tab, not a whitespace character. What is the difference in the last two ones? Cause I don't see either of them breaking, and how would one catch them with a regular expression? – JHolub Sep 25 '16 at 16:55
  • 1
    @JHolub: I think you're misunderstanding the term "whitespace". Tabs are one example of a whitespace character. Other examples include spaces, newlines, carriage returns, form feeds, non-breaking spaces, . . . – ruakh Sep 25 '16 at 17:08

1 Answers1

2

So in the end the whitespace right before the https part was a different type of whitespace, than all the others.

It was not a tab and I tried different regular expression symbols to get it (\f, \r and stuff) but it didn't work out.

What worked in the end was using replace(/\s/g, ''). \s also covers tabs, but I assume, that there probably won't be a topic change, that has no other change, than changing a whitespace to a tab.

Just keep in mind, that if tabs and whitespaces have to make a difference in your case, this solution won't work.

JHolub
  • 290
  • 3
  • 15
  • 1
    You may find `charCodeAt` helpful; see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/charCodeAt for documentation. – ruakh Sep 25 '16 at 17:05
  • I'd suggest using `.replace(/\s/g, ' ')` (note the space in the replacement string), so that `"foo bar"` and `"foobar"` aren't considered the same. – T.J. Crowder Sep 25 '16 at 17:09
  • And again: Suggest just deleting the question (and answer), they won't be useful to others in the future; too specific to your situation. – T.J. Crowder Sep 25 '16 at 17:10
  • 1
    Comparing too Strings is too specific? I see how the way I asked the question is too specific, but I think a lot of other people might run into this Problem, too. – JHolub Sep 25 '16 at 20:59