6

This question is based on an someone else's question that I answered. You can find the original question here, if desired. (And, maybe provide a better answer than I have.) However, everything you need to answer this question is right here.

Background

Given the following data in A1:C5

+-------------+--------+----------+
| Invoice Nbr |  Type  |  Status  |
+-------------+--------+----------+
| A0001       | Credit | Paid     |
| A0002       | Credit | Not Paid |
| B0001       | Debit  | Paid     |
| B0002       | Debit  | Not Paid |
+-------------+--------+----------+

The goal was to find a particular value in the table using an array formula that evaluated two columns. That is, give me the first Invoice (Column A) where Type is Debit and Status is Not Paid.

My first answer was an attempt to fix what I thought to be a problem with the OP's conditional. I put an AND wrapper around the two conditions as follows:

{=INDEX($A$2:$A$5,
        SMALL(IF(AND($B$2:$B$5 = "Debit", 
                     $C$2:$C$5 = "Not Paid"),
                 ROW($A$2:$A$5)-ROW($A$2)+1),
              1)
       )}

However, that didn't work.

I ended up suggesting this answer, because it actually works:

{=INDEX($A$2:$A$5,
        SMALL(IF($B$2:$B$5 & $C$2:$C$5 = "DebitNot Paid",
                 ROW($A$2:$A$5)-ROW($A$2)+1),
              1)
       )}

My Question

Array formulae in Excel are sometimes so much voodoo to me. It seems like either one should provide the result B0002. In fact, only the second formula gives the desired result. What principle or evaluation process is Excel (2013) following that makes this so? Or, in other words, what am I not understanding about how Excel manages array formulae?

StoneGiant
  • 1,400
  • 1
  • 9
  • 21
  • 1
    Personally I would use: `=INDEX(A:A,AGGREGATE(15,7,ROW($A$2:$A$5)/(($B$2:$B$5=Debit)*($C$2:$C$5="Not Paid")),1)` Not an answer just an observation. Array formulas do not like `AND` or `OR` use `*` or `+` respectively. – Scott Craner Sep 25 '18 at 18:18
  • 1
    For starters, there are **two types** of array formulas: Array formulas that perform several calculations to generate a **single result**, and others that calculate **multiple results** from one formula. Some worksheet functions return arrays of values, or require an array of values as an argument. More from [Chip Pearson](//cpearson.com/excel/arrayformulas.aspx) and the [guidelines](//support.office.com/article/guidelines-and-examples-of-array-formulas-7d94a64e-3ff3-4686-9372-ecfd5caa57c7) and [how-to](//support.office.com/article/create-an-array-formula-e43e12e0-afc6-4a12-bc7f-48361075954d). – ashleedawg Sep 25 '18 at 18:21
  • @ScottCraner, in a way you do answer the question. "Array formulas do not like `AND` or `OR`..." It would be nice to dig deeper, but the bottom line is "Don't use them." – StoneGiant Sep 25 '18 at 18:35
  • I saw once someone explain the why. I do not know the why. That is why I did not answer. There is a specific reason why. – Scott Craner Sep 25 '18 at 18:36
  • @ashleedawg, the Chip Pearson resource is very good. Thanks. I have read the Microsoft guides before and found them to be lacking, but the page from Chip was enlightening. – StoneGiant Sep 25 '18 at 18:36
  • I think the final answer is that `AND()` and `OR()` always return a scalar result, even in array formulas. `AND({TRUE,TRUE,TRUE},{TRUE,FALSE,TRUE})` is always `FALSE` and not `{TRUE,FALSE,TRUE}` as one might hope. However, I cannot find an official source for that. – StoneGiant Sep 25 '18 at 21:47

3 Answers3

4

@StoneGiant is right. There are functions that can't be used inside array formulas because their sole "reason for being" is to turn an array into a scalar value. And sometimes you actually want those functions to return the single value inside an array formula, so how would the formula execution algorithm know which way you want it.

AND() and OR() are kind of good examples of this. =AND(array of values) returns a single value. MIN() is another example. Take the array formula

=SUM((A1:A10="Bob")*(Min(B1:B10))

MIN() is supposed to take a bunch of values and return the lowest. How would program MIN() to return an array? Where do you decide what's lowest in what context? You can't. That's why it doesn't work in array formulas.

Take SMALL() next. It can return an array formula because it's not only supposed to return the lowest value, but the nth lowest. Because of that nth argument, you can now return an array

=SMALL(A1:A10,{2,3})

That will return an array with two elements. MS could have programmed small to not work as an array formula, but in general I've found they've included array functionality wherever it was practical.

I said AND() and OR() are kind of good examples. MS could have programmed them to return an array like @StoneGiant's comment. It would have been a strange design decision to me, but they could have done it. But they already have AND and OR in array formulas. Multiply for AND and add for OR.

=MAX((B2:B5="Debit)*(C2:C5="Not Paid")*(ROW(B2:B5))

That will return the row number of the last line that's a debit and not paid. (You're example problem wanted the first one, but MIN introduces problems that complicate this example). The multiplications between the equalities is the same as AND. If you wanted the last row that's a debit and where column C says Not Paid or Not Sure, you could do that with addition.

=MAX((B2:B5="Debit)*((C2:C5="Not Paid")+(C2:C5="Not Sure"))*(ROW(B2:B5))

The plus sign is an OR between those two sets of parentheses. And the results of that OR is included against the other parentheses as an AND (multiplied).

You can put those in an INDEX

=INDEX(A2:A5, MAX((B2:B5="Debit)*(C2:C5="Not Paid")*(ROW(B2:B5)))

So to answer your question, I think: Generally, functions whose purpose is to turn a bunch of values into one value don't work inside array formulas. AND() and OR() are somewhat special because addition and subtraction already do that inside array formulas.

See more about arrays here http://dailydoseofexcel.com/archives/2004/04/05/anatomy-of-an-array-formula/

Dick Kusleika
  • 32,673
  • 4
  • 52
  • 73
  • 1
    And now I see that @StoneGiant is also the OP. Sorry I missed that. Please read those parts of my answer like I'm not an idiot. – Dick Kusleika Sep 25 '18 at 22:34
  • LOL! Noted. Thanks for your detailed answer. – StoneGiant Sep 25 '18 at 23:35
  • I think MATCH is another one of those functions that are not compatible with CSE. Indeed, you can see the behavior is different in the Evaluate Formula tool with this formula: `{=MATCH($A$1:$A$10;$A$1:$A$10;0)}`, the first argument is not evaluated as an array, but only as the first cell of the range. The formula actually returns the correct results because I think when non-CSE-compliant function are used in a multiple cells array formula, Excel calculates the formula in each cells, the same way it works when you type `=$A$1:$A$10` in `$C$1:$C$10` (non CSE formula with CTRL+ENTER only) – hymced Apr 24 '19 at 13:21
  • The problem with MATCH in a single cell is that it returns an array, but Excel doesn't know how to present that to you. So it does an implicit intersection and shows you only the result for the row you're on. If you're formula is in row 1, it shows the first result. If row 5, the fifth result. But take your formula `=MATCH($A$1:$A$10;$A$1:$A$10;0)`, go into the formula bar, highlight everything except the equal sign, and press Ctrl+Enter. You get something like `={1;2;3;4;5;6;7;8;9;10}`. So it's returning an array. Put a `SUM` around that and you have a proper array formula. – Dick Kusleika Apr 24 '19 at 23:03
  • oh you are right, in my case it was INDEX that cannot return an array: `=SUM(INDEX({1;2;3;4;5;6;7;8;9;10},{1;2;3;4;5;6;7;8;9;10}))` in a single cell array formula, evaluates to `=SUM(1)`, then returns 1 instead of 55. – hymced Apr 25 '19 at 09:06
  • hehe, found the way around it to force INDEX returning an array, look here https://stackoverflow.com/questions/47187863/can-excels-index-function-return-array – hymced Apr 25 '19 at 13:03
2

AND and OR usually don't work well with array formulas because they return a single result (not an array).

This part of your formula:

AND($B$2:$B$5 = "Debit", $C$2:$C$5 = "Not Paid")

Will return a single TRUE (not an array of TRUE/FALSE values) if $B$2:$B$5 are all Debit and if $C$2:$C$5 are all Not Paid (and will return FALSE otherwise).

This part of your formula returns what you desire (an array of TRUE/FALSE values):

$B$2:$B$5 & $C$2:$C$5 = "DebitNot Paid"

Personally I don't really like this method though, I would instead do something like:

($B$2:$B$5="Debit")*($C$2:$C$5="Not Paid")

SIDE NOTE

Although in this particular problem it makes no difference, the problem with concatenating strings and comparing to a result is that you may get a false positive if you are not careful.

Simple example:

(A1="ab")*(B1="cd")

Will return a "slightly" different result than:

A1&B1 = "abcd"

Because if for example A1 is abc and B1 is d, the top formula returns FALSE but the bottom formula returns TRUE.

But then again... depending on the application, maybe this is what you want. Just be careful in concatenating strings and comparing the result for this reason.

Bottom line... Generally you should avoid AND and OR in array formulas because they, well, don't return arrays.

ImaginaryHuman072889
  • 4,953
  • 7
  • 19
  • 51
1

I'm not sure which part of array formulas you're having issue with -- but to be fair, it took me a long time to wrap my head around them, mainly because I rarely had a need for them.

A big part of the reason is that in Excel, there are often several ways to accomplish the same task. For example in the case of your example, I would have just inserted a hidden "helper column" to concatenate Type and Status, to the left of the Invoice # column, and do a basic VLOOKUP in that range.

An array formula is a formula that can perform multiple calculations on one or more of the items in an array . You can think of an array as a row of values, a column of values, or a combination of rows and columns of values. Array formulas can return either multiple results or a single result.

For example,

  • you can create an array formula in a range of cells and use the array formula to calculate a column or row of subtotals.
  • You can also place an array formula in a single cell and calculate a single amount.

An array formula that includes multiple cells is called a multi-cell formula, and an array formula in a single cell is called a single-cell formula.
(Source)


As for your example question, perhaps following the steps will shed some light.

This assumes that you already know the basics of using Match/Index functions to perform a lookup.

The formula without all of the $ absolute reference markers is:

=INDEX(B3:B6,MATCH(F3&G3,C3:C6&D3:D6,0))

breaking it down:

  • F3&G3 concatenates (combines) the Type and Status we're looking for, into a single string, like DebitPaid
    • the array we're looking in is C3:C6&D3:D6 which is sort of like:
    • C3&D3 and C4&D4 and C5&D5 and C6&D6
    • The 0 indicates that we want an exact match. (See the docs for MATCH.)

The MATCH portion of the formula alone in H3 would return 3 since the match is found in the 3rd subset of the array (aka, the 3rd row down from the top of our array.)

Then wrapping INDEX around it, we can make Excel return the "3rd value down" from the Invoice # column.

Adding the $ dollar signs to indicate absolute references, we can then copy or 'drag' the formula down as far as needed.

Remember that in order to enter an array formula you need to use Ctrl+Shift+Enter.

See my other answer here for more explanation and examples of Index/Match and Absolute Relative references.


More Information:

ashleedawg
  • 20,365
  • 9
  • 72
  • 105
  • While I appreciate the very long and involved answer, it does not answer my question. I am not trying to solve a particular problem. That was given only as an example. I am trying to understand why `AND()` and `OR()` do not work as expected in array formulas. – StoneGiant Sep 25 '18 at 21:44
  • Did you mean to use a "middle-finger" emoji? – SierraOscar Sep 25 '18 at 22:31