-2

I've just encountered something that I don't quite understand. I received a document (administrative memo from my employer) containing a web address. The address is not a clickable hyperlink, it is just text.

What is interesting is that when the address is copy & pasted into a web browser address bar, it causes the web browser to attempt to contact a different web address than the cut & pasted text contains. The address text initially appears to be pasted correctly into the address bar, until I hit enter -- then instantly the text changes to something else.

Please note that this is not a matter of simple web site redirection. I know this because if I manually type in the same address (instead of copy & pasting it from the original document), the "correct" address is loaded. It is only following the copy/paste/load process that text appears to be magically changing.

I have also noticed that if I copy & paste the address first into a Notepad text file, save the text file, close, re-open, and then copy/paste to the web browser, the "correct" site then loads. Of note, when I save, Notepad warns that there are characters in Unicode format which will be lost. So I assume that there is some hidden unicode text that is being stripped out when I save as plain text.

But, in Notepad if I enable the "Show Unicode Control Characters" option, I see nothing. So what could be going on here?

To get really specific, the domain transforms like this: http://www.aaaaaaaaaa-usa.com/bbbbb/ddddddtools.html ==> www.xn--aaaaaaaaaausa-km6g.com. (The browser of course reports that it cannot find the IP address of the server)

phog2
  • 1
  • 4
  • It would be helpful if you included the *full* original URL. –  Mar 27 '18 at 19:53
  • There is no way a web browser would convert `www.abcd-co.com` to `www.xn--abcdco--km6g.com`. The former consists of only ASCII characters and as such is a valid DNS hostname as-is. `www.xn--abcdco--km6g.com` is actually the IDN encoded form of `www.ab㞽cdco-.com` instead, which means you have a copy/paste issue. – Remy Lebeau Mar 27 '18 at 19:56

1 Answers1

0

For compatibility, domain names should be ASCII text, so there is a standard (IDN) to convert other characters to ASCII, using the two letter prefixes followed by two dashes --.

Additional, there were some phishing attack, using letter on other alphabets, that looked like latin letters, so deceiving users. So some browsers choose to display the ascii name instead of the intended name. (It changes from browser to browser, and usually only on selected similar characters).

Giacomo Catenazzi
  • 8,519
  • 2
  • 24
  • 32
  • Interesting. I wonder how the person at my company who created the memo ended up with a couple random unicode characters in the URL. – phog2 Mar 27 '18 at 20:20
  • Without knowing the url it is difficult to say. It could be a different dash (word-processor replace minus sign with other dashes, or ligatures (but this should be done by font, not unicode). Or maybe your company name usually use a different letter, and it was used also on url (in such example) [think about `e` written differently, not unusual on some company names] – Giacomo Catenazzi Mar 28 '18 at 05:31