25

I'm going through a code review and I'm curious if it's better to convert strings to upper or lower case in JavaScript when attempting to compare them while ignoring case.

Trivial example:

var firstString = "I might be A different CASE";
var secondString = "i might be a different case";
var areStringsEqual = firstString.toLowerCase() === secondString.toLowerCase();

or should I do this:

var firstString = "I might be A different CASE";
var secondString = "i might be a different case";
var areStringsEqual = firstString.toUpperCase() === secondString.toUpperCase();

It seems like either "should" or would work with limited character sets like only English letters, so is one more robust than the other?

As a note, MSDN recommends normalizing strings to uppercase, but that is for managed code (presumably C# & F# but they have fancy StringComparers and base libraries):

http://msdn.microsoft.com/en-us/library/bb386042.aspx

Audwin Oyong
  • 2,247
  • 3
  • 15
  • 32
Josh R
  • 1,970
  • 3
  • 27
  • 45
  • Since most strings will contain more lowercase, at least converting to lowercase will have less characters to treat but other then that.. Is there any difference? – Félix Adriyel Gagnon-Grenier Nov 12 '14 at 01:08
  • 1
    I'm not sure if there is any other difference in JavaScript, the MSDN link says there are some characters that can't make a round trip - "Strings should be normalized to uppercase. A small group of characters, when they are converted to lowercase, cannot make a round trip. To make a round trip means to convert the characters from one locale to another locale that represents character data differently, and then to accurately retrieve the original characters from the converted characters." - But I'm not sure if that is unique to .Net or if it applies to all/most programming languages. – Josh R Nov 12 '14 at 01:12
  • 1
    I'm guessing it's browser dependent how those two methods work internally, but that they both probably iterate over the characters and check and convert them, so it doesn't matter. In real life, it certainly doesn't matter. – adeneo Nov 12 '14 at 01:12
  • 2
    http://jsperf.com/upper-or-lower – adeneo Nov 12 '14 at 01:15

3 Answers3

30

Revised answer

It's been quite a while when I answered this question. While cultural issues still holds true (and I don't think they will ever go away), the development of ECMA-402 standard made my original answer... outdated (or obsolete?).

The best solution for comparing localized strings seems to be using function localeCompare() with appropriate locales and options:

var locale = 'en'; // that should be somehow detected and passed on to JS
var firstString = "I might be A different CASE";
var secondString = "i might be a different case";
if (firstString.localeCompare(secondString, locale, {sensitivity: 'accent'}) === 0) {
    // do something when equal
}

This will compare two strings case-insensitive, but accent-sensitive (for example ą != a).
If this is not sufficient for performance reasons, you may want to use either
toLocaleUpperCase()ortoLocaleLowerCase()` passing the locale as a parameter:

if (firstString.toLocaleUpperCase(locale) === secondString.toLocaleUpperCase(locale)) {
    // do something when equal
}

In theory there should be no differences. In practice, subtle implementation details (or lack of implementation in the given browser) may yield different results...

Original answer

I am not sure if you really meant to ask this question in Internationalization (i18n) tag, but since you did...
Probably the most unexpected answer is: neither.

There are tons of problems with case conversion, which inevitably leads to functional issues if you want to convert the character case without indicating the language (like in JavaScript case). For instance:

  1. There are many natural languages that don't have concept of upper- and lowercase characters. No point in trying to convert them (although this will work).
  2. There are language specific rules for converting the string. German sharp S character (ß) is bound to be converted into two upper case S letters (SS).
  3. Turkish and Azerbaijani (or Azeri if you prefer) has "very strange" concept of two i characters: dotless ı (which converts to uppercase I) and dotted i (which converts to uppercase İ <- this font does not allow for correct presentation, but this is really different glyph).
  4. Greek language has many "strange" conversion rules. One particular rule regards to uppercase letter sigma (Σ) which depending on a place in a word has two lowercase counterparts: regular sigma (σ) and final sigma (ς). There are also other conversion rules in regard to "accented" characters, but they are commonly omitted during implementation of conversion function.
  5. Some languages has title-case letters, i.e. Lj which should be converted to things like LJ or less appropriately LJ. The same may regard to ligatures.
  6. Finally there are many compatibility characters that may mean the same as what you are trying to compare to, but be composed of completely different characters. To make it worse, things like "ae" may be the equivalent of "ä" in German and Finnish, but equivalent of "æ" in Danish.

I am trying to convince you that it is really better to compare user input literally, rather than converting it. If it is not user-related, it probably doesn't matter, but case conversion will always take time. Why bother?

stakx - no longer contributing
  • 83,039
  • 20
  • 168
  • 268
Paweł Dyda
  • 18,366
  • 7
  • 57
  • 79
  • 1
    Users don't always enter text with the proper case, so testing input literally will produce a poor UX. There are occasions when case should be significant (e.g. passwords), but most of the time being pedantic about it will cause unnecessary frustration. – Barmar Oct 09 '18 at 17:19
  • What about when you do not know the locale? For instance, a web page open to all the world asking for the customer name and address? The page could be in English, but the customer is German with a sharp s in her address, or Greek with a final sigma? – Jonathan Rosenne Nov 06 '21 at 20:39
  • I was trying to convince you, that you actually **know** the locale. User agent sends "Accept-Language" header along with request. There are tons of server-side scripts to parse it. Then you can simply set the locale globally for all JS code and use it. The other question is, why in the world you want to convert character case in form fields... This does not seem like a good idea. – Paweł Dyda Nov 11 '21 at 09:57
6

Some other options have been presented, but if you must use toLowerCase, or toUpperCase, I wanted some actual data on this. I pulled the full list of two byte characters that fail with toLowerCase or toUpperCase. I then ran this test:

let pairs = [
[0x00E5,0x212B],[0x00C5,0x212B],[0x0399,0x1FBE],[0x03B9,0x1FBE],[0x03B2,0x03D0],
[0x03B5,0x03F5],[0x03B8,0x03D1],[0x03B8,0x03F4],[0x03D1,0x03F4],[0x03B9,0x1FBE],
[0x0345,0x03B9],[0x0345,0x1FBE],[0x03BA,0x03F0],[0x00B5,0x03BC],[0x03C0,0x03D6],
[0x03C1,0x03F1],[0x03C2,0x03C3],[0x03C6,0x03D5],[0x03C9,0x2126],[0x0392,0x03D0],
[0x0395,0x03F5],[0x03D1,0x03F4],[0x0398,0x03D1],[0x0398,0x03F4],[0x0345,0x1FBE],
[0x0345,0x0399],[0x0399,0x1FBE],[0x039A,0x03F0],[0x00B5,0x039C],[0x03A0,0x03D6],
[0x03A1,0x03F1],[0x03A3,0x03C2],[0x03A6,0x03D5],[0x03A9,0x2126],[0x0398,0x03F4],
[0x03B8,0x03F4],[0x03B8,0x03D1],[0x0398,0x03D1],[0x0432,0x1C80],[0x0434,0x1C81],
[0x043E,0x1C82],[0x0441,0x1C83],[0x0442,0x1C84],[0x0442,0x1C85],[0x1C84,0x1C85],
[0x044A,0x1C86],[0x0412,0x1C80],[0x0414,0x1C81],[0x041E,0x1C82],[0x0421,0x1C83],
[0x1C84,0x1C85],[0x0422,0x1C84],[0x0422,0x1C85],[0x042A,0x1C86],[0x0463,0x1C87],
[0x0462,0x1C87]
];

let upper = 0, lower = 0;
for (let pair of pairs) {
   let row = 'U+' + pair[0].toString(16).padStart(4, '0') + ' ';
   row += 'U+' + pair[1].toString(16).padStart(4, '0') + ' pass: ';
   let s = String.fromCodePoint(pair[0]);
   let t = String.fromCodePoint(pair[1]);
   if (s.toUpperCase() == t.toUpperCase()) {
      row += 'toUpperCase ';
      upper++;
   } else {
      row += '            ';
   }
   if (s.toLowerCase() == t.toLowerCase()) {
      row += 'toLowerCase';
      lower++;
   }
   console.log(row);
}
console.log('upper pass: ' + upper + ', lower pass: ' + lower);

Interestingly, one of the pairs fails with both. But based on this, toUpperCase is the best option.

Zombo
  • 1
  • 62
  • 391
  • 407
-3

It never depends upon the browser as it is only the JavaScript which is involved. both will give the performance based upon the no of characters need to be changed (flipping case)

var areStringsEqual = firstString.toLowerCase() === secondString.toLowerCase();
var areStringsEqual = firstString.toUpperCase() === secondString.toUpperCase();

If you use test prepared by @adeneo you can feel it's browser dependent, but make some other test inputs like:

"AAAAAAAAAAAAAAAAAAAAAAAAAAAA"

and

"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

and compare.

Javascript performance depends upon the browser if some DOM API or any DOM manipulation/interaction is there, otherwise for all plain JavaScript, it will give the same performance.

Audwin Oyong
  • 2,247
  • 3
  • 15
  • 32
KanhuP2012
  • 407
  • 3
  • 9