9

I was wondering if anyone when why the decision was made in Excel (or maybe it was made before Excel and Excel later adopted it) that strings are compared case-insensitively and strings are always considered different than numbers. For example:

  • ="a"="A" is true in Excel
  • ="1"=1 is false in Excel
  • =1+"1" is 2 in Excel (so it auto-casts on the + operator but not comparisons? -- this seems to be 'newer' behavior)

Is this feature considered useful? I find the case-insensitive 'default' is often very useful for sorting purposes; but the fact that a number is not equivalent to a string-encoding of a number often an annoyance. Is there a general consensus on whether this behavior is useful or not (and if not, is it just preserved for historical reasons)?


Here is the closest reference I could find: https://support.microsoft.com/en-gb/office/calculation-operators-and-precedence-in-excel-48be406d-4975-4d31-b2b8-7af9e0e2878a#:~:text=How%20Excel%20converts%20values%20in%20formulas. This mentions that arithmetic and concatenation operands will coerce types, but does not say anything about comparison (though those are never coerced it seems).

David542
  • 104,438
  • 178
  • 489
  • 842
  • 7
    We cannot know for sure why they did this. One valid guess is that it just so happened. Another is that the cells are untyped in Excel, so if a column contained both `1`s and `"1"`s, it would be impossible to find a `1` or a `"1"` specifically if the comparison operator did not distinguish them. – GSerg Jun 30 '23 at 00:16
  • @GSerg I see, thanks for the two possibilities here. For yourself, do you find this a useful feature? not useful? or just irrelevant in day-to-day stuff and doesn't matter to you. – David542 Jun 30 '23 at 00:23

3 Answers3

11

Is this feature considered useful?

Yes. Because it enables all scenarios the end users may want, in an accessible fashion.

Excel has only one comparison operator, as opposed to some languages that have two (strict and non-strict), and Excel's cells are not strictly typed.

If the comparison operator deemed strings and numbers the same, it would be really difficult for the end user to look up a string specifically or a number specifically on a sheet in situations where the string may look like a number (e.g. a phone number). A simple lookup formula such as:

=VLOOKUP(A1, B1:C10, 2, 0)

would need to become an array formula along the lines of:

{=VLOOKUP(A1,IF(TYPE(B1:B10)=TYPE(A1),B1:C10,0),2,0)}

... except that it wouldn't work, because the TYPE function natively accepts multicell ranges, so it would unhelpfully return a single value, 64, for the entire range, signifying that it's an array.

And naturally, you want your VLOOKUP and your = to use same rules to avoid inconsistent and confusing results, so you cannot decree that VLOOKUP and the such should use strict comparison but = should remain loose.

With the comparison operator distinguishing strings and numbers you can both easily lookup a value in a type-wise fashion (which also easily detects a data type mismatch when that vlookup does not find anything):

=VLOOKUP(A1, B1:C10, 2, 0)

and in a non-typewise fashion:

{=VLOOKUP(A1&"",B1:C10&"",2,0)}

It gives you control over whether to coerce strings to numbers or numbers to strings during the comparison, too.

And this is the same type of explicit control that you exert when using the + operator: by choosing the + you are saying that you want to coerce strings to numbers for the purpose of your operation, so =1 + "1" becomes 2. If you wanted to coerce numbers to strings, you would use =1 & "1" instead, resulting in "11". So no confusion arises here either: the + does not convert things to numbers inconsistently or unexpectedly, it does so because it's a +, and it offers a counterpart that does the opposite.

GSerg
  • 76,472
  • 17
  • 159
  • 346
  • 2
    I think the equal (`=`) comparison should be explained in the context of its basic operation, i.e. comparing to values, what was Excel's intent in not accepting a transformation? That probably has to do with the origin of the programming language and follows a similar practice, not in the context of its application in a high-level function as `VLOOKUP` this is a side effect I would say. The thing is that it is difficult to find this information, what they had in mind when Excel was created and the product decision they made such as this one or blank is not equal to `=""`. – David Leal Jun 30 '23 at 14:18
  • 1
    @DavidLeal `what was Excel's intent in not accepting a transformation` - because if it made that choice for you, you'd have no way to perform your comparison with a different choice applied, which is what is laid out in the answer? `That probably has to do with the origin of the programming language` - there isn't one. – GSerg Jun 30 '23 at 18:55
  • 1
    I explained in my answer, based on the most common scenario: equal comparison doesn't transform the data, compare them as they are, which is logical. If users want something different, they need to explicitly transform the data before invoking `=`, so it is allowed, just Excel made it more verbose/harder (`VALUE("1") = 1` or `0+"1" = 1`), leaving the shorter option (equal type and value) the default one. – David Leal Jun 30 '23 at 20:19
6

I was wondering if anyone when why the decision was made in Excel (or maybe it was made before Excel and Excel later adopted it) that strings are compared case-insensitively and strings are always considered different than numbers

Human beings need to know when two things are really equal, not just equivalent, this is a human need. Product or programming languages are enablers for this need.

Products and programming languages offer at least strict equality (truly equal by value and data type), i.e. comparison with no transformation, but there are others that also offer non-strict equality (by value but not by data type, i.e. equivalence). For example just take two programming languages: Javascript and Python (both considered dynamically typed languages).

Javascript

You have ==and === (Strict Equality, i.e. value and type must be same). The operator == does the implicit conversion, i.e. "5" == 5 returns true, but "5" === 5 returns false. Check this link for more information.

Python

You have only == which doesn't do any conversion, in this case: "5" == 5 returns false. Check this link for more information.

Excel

Equal Operator

Only compares for equality, through a single operator (=) which behaves similarly to Python, i.e. the type and the value must be the same, strict equality, i.e. just comparison with no transformation. This behavior ensures that comparisons are performed accurately and consistently based on the data types involved. It helps prevent unintended and potentially erroneous comparisons that could occur if data types were automatically coerced or converted during the comparison process.

The above ensure fundamental principles of data types and comparisons in programming languages, such as: Type Safety and Explicitness, Predictability and Consistency, and Data Integrity.

Therefore:

"1" = 1     -> FALSE
  0 = FALSE -> FALSE 

The above examples are considered the most common use case scenarios, however, you can compare different data type, but you need to do explicitly the transformation (more verbose formula, since the is no other way to compare). For example:

VALUE("1") = 1 -> TRUE
0 + "1" = 1    -> TRUE

Why Excel made those product decisions? It is difficult to say unless they publish something around that. My guess is based on the most common scenario and user target audience back in 85: users wanted to know if two things are equal by type and value, which is fundamental principles of data types and comparisons in programming languages. If this is the most common scenario, then the decision was to specify the equal sign in a way it forces it. If you don't want this behavior, then it results in a more verbose formula, because you need to convert the value first (shown it before).

Case-insensitive string comparison

Probably the reason back in the 80s was based on compatibility with other existing competitors such as Multiplan(82), Lotus 1-2-3(83) to easy get/migrate their users. Previous programming languages in the 50s and 60s, such as BASIC, FORTRAN, or COBOL were all case-insensitive too. Another reason user expectations at that time Excel target users widely spread in various industries and by a diverse range of users, many of whom may not have programming backgrounds. For such users, having case-insensitive comparisons and searches can be more intuitive and aligned with their expectations.

Python "A"=="a" returns false, but Excel for the equivalent expression returns TRUE. Excel is case insensitive for text comparison, except if you want to explicitly state it, then you need to use EXACT function. For example:

=EXACT("aaa","Aaa") -> FALSE

It is true, such behavior is not explicitly documented, it doesn't specify the meaning of the equal sign (=), based on the previous explanation it means same type and value, and for text values comparison it is case insensitive. ExcelJet states the same = is not case sensitive.

The bottom line is there is no universal practice around this, Excel product designers made that decision because they thought it is better for their product, more user-friendly, following fundamental principles of data types in programming languages, and in the end more money.

David Leal
  • 6,373
  • 4
  • 29
  • 56
  • You have simply described the behaviour that is already described in the question, however you have not mentioned why it exists and whether it is useful. – GSerg Jun 30 '23 at 08:11
  • 1
    It exists because human beings want to know if two things are the same. Products and programming languages just provide a tool to satisfy this need. Microsoft decided to have **only** one equal comparison operator. If there is only one, it should behave like strict equality, i.e. truly equal by data type nature and value. The user wants to know at first when two things are equal, not equivalent (the same but not the same, for example `1`Euro is equal to `$1.09` today, but they are different currencies, not the same) I was trying to respond to the question in the title. – David Leal Jun 30 '23 at 13:18
  • 1
    @GSerg thanks for your feedback, I added more information, but I could not find any confirmation on why they made that decision. In my understanding what they considered at that time was the most common scenario and to make a product decision that simplifies the syntax based on the most common scenario, penalizing with additional function required for a non-standard behavior. – David Leal Jun 30 '23 at 16:07
2

I would like to share my thoughts on this interesting topic, which in my view has nothing to do with Excel or any programming language but with something much more general:
the human need for knowledge and exploration.
Yes, this question took my thoughts that far. I will not list any links and programming examples, the topic is anthropophilosophical.

The "=" or "==" or "===" or "is" comparison operator, however it appears, is about our insatiable curiosity:

WHAT IS?

It does not measure the quantity of some entity but WANTS TO ADVANCE OUR KNOWLEDGE - of the "IS" of something.

It has to do with two poles: TRUTH and LIES or on a more general scale, what is RIGHT and what is INCORRECT. And I remember in the "logic" lessons we did at school: "Nothing fits between IS and NOT IS..." But real life holds many things: "opinions".

So let's get down to the facts:

  1. It is "a"="A" => TRUE Excel tells us. Here both factors ARE OF THE SAME TYPE, they are "String". Yes, someone will say, but one member is small and the other capital, right. Exactly here the decision was made by the designers of Excel, with the thought that: WE ARE NOT WRONG, it could be FALSE, but we have this opinion, having the knowledge of what the alphabet is. In short the result "a"="A" => TRUE is a safe VIEW, because it could be FALSE without anyone complaining. Obviously it serves practical reasons.

  2. Excel tells us that "1"= 1 => FALSE. Here each factor IS OF A DIFFERENT TYPE, one is: "String" and the other eg: "Integer". Here the designers of Excel are parsimonious...they don't want to mislead us about knowing something IS. They left whatever "view" fits, to us, because they may not be of the same type but if the Integer is converted to a String then we will see that they are of the same content. They left the post-conversion comparison view to us: 1&"" = "1" => TRUE, in my opinion correct, since they don't have another operator like "===" in js.

  3. 1+"1" makes 2 in Excel. What is going on here? do we have two measures and two weights? Here we have two members with different types and a QUANTITATIVE ACT is performed between them. We have no act of knowledge of IS... Our knowledge of WHAT each member IS is not affected, but we very loosely ask for a quantitative measure of the addition of two dissimilar entities, IF IT IS POSSIBLE. So the manufacturers of Excel have the opinion that they should give us such a result, IF IT IS POSSIBLE and they give it to us and they do it very correctly in my opinion.

The result of a quantitative operation is given to us directly, while the result through comparison about: what a thing is, can have many levels of knowledge. A cat is not a dog but a cat is an animal and a dog is an animal and a hen is an animal but bipedal...

In summary, people's temperament to attain the ultimate and supreme knowledge directs them in how they implement the KNOWLEDGE operators as opposed to the QUANTITATIVE operators. No matter how much liberties are given for consumption and welcome, we will always need "===" and "1"= 1 => FALSE.