2

I am haivng this pattern in js file:

var emailPattern = /^[^\W_](\.{0,1}[^<>(){\}[\]\\.,;:%\s@\"]+)*@([a-zA-Z0-9-]{1,}\.)+[a-zA-Z]{2,6}$/;

While the page is loaded, I am getting this in view source,

var emailPattern = /^[^\W_](\.{0,1}[^<>(){\}[\]\\.,;:%\s@\"]+)*@([a-zA-Z0-9-]{1,}\.)+[a-zA-Z]{2,6}$/;

(This page is a japanese specific page but js is in plain english).

Anyway to prevent \ (normal backslash ) from becoming this \ (unicode: &#65340;)

VisioN
  • 143,310
  • 32
  • 282
  • 281
Arindam Paul
  • 378
  • 1
  • 2
  • 10
  • Does this happens in the entire file? – Evandro Silva Aug 24 '12 at 10:04
  • Yes.. I don't have any other backslashes .. so yes. – Arindam Paul Aug 24 '12 at 10:06
  • Which browser? What kind of HTTP server is serving the JS? Is there a test URL we can look at? What character encoding are you using to serve the page? We need a lot more details to help out here... – Gijs Aug 24 '12 at 10:09
  • Chrome it is .. and it's internal so can't share any link but you can try this in chrome console. remove the var part and paste the first pattern in console and try emailPattern.test("abc@abc.com"); it will return true. then do the same for the second and it will return false. – Arindam Paul Aug 24 '12 at 10:12
  • A suggestion, check with a byte editor and see if the backslashes are ok in the js file. If it is so then the problem is how your server handles the js document – Gabber Aug 24 '12 at 10:18
  • 2
    @Arindam: What Gabber said. It's obvious the second pattern won't work; it's not obvious why the server/browser/editor is botching your document. It looks like there's some kind of filter on the server/editor that "prettifies" things (maybe also turning straight " quotes into curly ones, etc.). But without knowing more about the server, there's no way to tell exactly what's going on. – Gijs Aug 24 '12 at 10:21
  • @Gijs: I do agree with you. Let me see if I can provide more information. – Arindam Paul Aug 24 '12 at 10:32
  • Yes, I'd put my money on @Gijs being correct. And if it's not to "prettify", it could ironically to make javascript less likely to work (to block XSS attacks). – Jon Hanna Aug 24 '12 at 15:54
  • @ArindamPaul eh, so, you wrote below in a comment on Stefan's answer that you "do some encoding thing in perl". Can you put that perl up on a pastebin and link it? It sounds like that's the problem... – Gijs Aug 25 '12 at 13:47

3 Answers3

2

This seems to be a character encoding issue.

For example, when editing Shift-JIS encoded files in SubEthaEdit, backslashes appear as \ (0x5C) in the editor but are actually inserted as (0xFF3C) in the file.

Copying the source code from the editor converts it back to UTF-8 and gives me a "normal" backslash in the clipboard.

A workaround in SubEthaEdit would be to enter the ¥ character, which is 0x5C in Shift-JIS.

All this is specific to SubEthaEdit but maybe you're facing a similar problem.

Stefan
  • 109,145
  • 14
  • 143
  • 218
  • This is gr8 info .. we are trying it out. – Arindam Paul Aug 24 '12 at 10:58
  • But as Stefan said, this is for SubEthaEdit. Not working in our case. The problem is to display the japanese character in browser, we do some encoding thing in perl. Now this regular expression is affected. IS there anyway to mention encoding style specifically for a regex. – Arindam Paul Aug 24 '12 at 12:16
  • Are you sure that your file is UTF-8 encoded and that the backslash is encoded as `0x5C`? – Stefan Aug 24 '12 at 12:41
  • Yes the encoding in the page is UTF-8 for sure. The character which we are talking about is http://www.fileformat.info/info/unicode/char/ff3c/index.htm normal backslash(keyboard one) is converted to this full width one (as described in the mentioned link). – Arindam Paul Aug 25 '12 at 09:00
0

Think about your editor and browser policies:

  • First line : Consolas 10
  • Second line : MS Mincho 10
Gabber
  • 5,152
  • 6
  • 35
  • 49
xavier Z
  • 144
  • 6
  • No it is not that .. font is same. everything same.. infact if you copy and search the character it is not a backslash at all. – Arindam Paul Aug 24 '12 at 10:08
0

Copy your code string to notepad, then copy it back to your code from there. This will destroy your unicode characters and convert the backslashes to real backslashes.

tomsv
  • 7,207
  • 6
  • 55
  • 88