2

I've searched far and wide for a solution to my problem... over several long weeks now. I've come up a partially working solutions, which I'll include at the bottom for those who might know how to modify/extend them to resolve the problem.

Here's what I'm trying to accomplish (the following descriptions are in reference to this screen capture https://i.stack.imgur.com/1Z8M1.jpg):

enter image description here

1) I am looking to match values from Column G with Column A ($G2 with $A$2:$A$10).

2) If the values match, then extract unique values (excluding blanks) from adjacent Columns B, C, and D, and return them across columns (H through M).

So, as shown in the 'desired results 1' table (G1:M10), if G2 (Vancouver) is found in A2:A10, then extract indexed values from columns B,C, and D where Vancouver is a match (i.e., Blue, ,Green,Green,Red,Yellow,Teal,Green, , , ,Pink,Pink,Pink, ,) and return only the unique values (excluding blanks) across columns H through L (blue,Green,Red,Yellow,Teal,Pink).

I've been looking for a formula solution that can be popped into H2 and then dragged horizontally to M2, to reveal the results. I'm not married to this solution, however. An alternative that I've also considered is to return the unique values in comma delimited format rather than across columns(see 'desired Results 2'). If this is easier to accomplish, I'm all for it.

Note: I will be running this formula on 10,000+ rows. A lean/efficient solution is desirable if possible.

PARTIAL SOLUTIONS I'VE PUT TOGETHER:

1) {Array formula 1}, results can be seen here:

=IFERROR(INDEX($B$2:$B$10, SMALL(IF(COUNTIF($G2,$A$2:$A$10), MATCH(ROW($B$2:$B$10), ROW($B$2:$B$10)), ""), COLUMN(A1))),"")

This formula is only able to index column B, but it successfully matches and returns values across columns. Unfortunately, it doesn't extract unique values and returns blank cells.

2) Re: Justin's comment about a likely VBA solution, figure I should pop up one of the VBA solutions I came across.

Function UNIQUE_PH(Lookupvalue As String, LookupRange As Range, ColumnNumber As Integer)
Dim i As Long
Dim Result As String
For i = 1 To LookupRange.Columns(1).Cells.Count
  If LookupRange.Cells(i, 1) = Lookupvalue Then
    For J = 1 To i - 1
    If LookupRange.Cells(J, 1) = Lookupvalue Then
      If LookupRange.Cells(J, ColumnNumber) = LookupRange.Cells(i, ColumnNumber) Then
        GoTo Skip
      End If
    End If
    Next J
    Result = Result & " " & LookupRange.Cells(i, ColumnNumber) & ","
Skip:
  End If
Next i
UNIQUE_PH = Left(Result, Len(Result) - 1)
End Function

'It takes 3 arguments as inputs:

'1. Lookupvalue – A string that we need to look-up in a range of cells.
'2. LookupRange  – An array of cells from where we need to fetch the data 
'3. ColumnNumber – It is the column number of the table/array from which matching value is to be returned (e.g. 2 for second column).

This basically does what the above formula accomplishes, with the exception that it successfully identifies unique values. It presents the results in one cell as comma delimited (similar to 'desired Results 2'). This isn't able to search through multiple columns and doesn't remove blank cells. I also tried to run it on 10,000 rows of data and it was very slow. I know very little about VBA, so I'm not sure what could be contributing to the lag.

Community
  • 1
  • 1
sociologix
  • 69
  • 1
  • 8
  • I think you'll need to go full-VBA to do this. But I'd love to see a formula answer because Excel formulas are my little darlings. – Justin Burgard Mar 30 '17 at 22:20

2 Answers2

3

Sample Data solution

'in G2
=A2
'in H2 as an array formula with CSE
=IFERROR(INDEX($B$2:$B$16, MATCH(0, IF($A$2:$A$16=$G2, IF(SIGN(LEN($B$2:$B$16)), COUNTIF($G2:G2, $B$2:$B$16), 1), 1), 0), 1),
 IFERROR(INDEX($C$2:$C$16, MATCH(0, IF($A$2:$A$16=$G2, IF(SIGN(LEN($C$2:$C$16)), COUNTIF($G2:G2, $C$2:$C$16), 1), 1), 0), 1),
 IFERROR(INDEX($D$2:$D$16, MATCH(0, IF($A$2:$A$16=$G2, IF(SIGN(LEN($D$2:$D$16)), COUNTIF($G2:G2, $D$2:$D$16), 1), 1), 0), 1),
 TEXT(,))))
'in G19
=G2
'in H19
=TEXTJOIN(",", TRUE, H2:N2)

Fill H2 right then G2:N2 down as appropriate. Fill G19:H19 down to collate the values above. See footnote ¹ if you get a #NAME! error on the TEXTJOIN function.

enter image description here

10K Rows of Data solution

Array formulas chew up calculations exponentially the referenced ranges increase in size. The only way to do this as an array formula is to only reference the rows with Vancouver when you are creating a unique list for Vancouver and only reference the rows with Seattle when you are creating a unique list for Seattle. In other words, don't reference rows 2:10000 for Vancouver when Vancouver's data is in rows 2:6.

Note: you will have to sort your data with column A as the primary key. This operation requires it. Unsorted data (even grouped) will not allow the second MATCH function to locate the terminating row.

The rows of data in column B containing Vancouver in column A can be referenced with this.

INDEX(B:B, MATCH("vancouver", A:A, 0)):INDEX(B:B, MATCH("vancouver", A:A))

Now all you have to do is replace all occurrences of $B$2:$B$16 with the above. Adjust the formula and make the replacements for $C$2:$C$16 and $D$2:$D$16 as well. The bonus is that you can get rid of the check to see IF($A$2:$A$16=$G2, ... since by definition, you are only referencing rows with the appropriate city in column A. Make sure you reference G2 and do not hardcode in Vancouver.

'in H2 as an array formula with CSE
=IFERROR(INDEX(INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A))), 1), 0)),
 IFERROR(INDEX(INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A))), 1), 0)),
 IFERROR(INDEX(INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A))), 1), 0)),
 TEXT(,))))

You should be able to live with the calculation time on that array formula even through 10K rows.

enter image description here

I'm pretty sure that this is as far as native worksheet functions can go. Further improvements would be using variant memory arrays.

Addendum²

There is one further optimization. A worksheet IF only processes the part of the formula that is true. If you look at the names of the cities and only process the array portion of the formula when the city names change, copying directly from above when they are the same you should be able to limit the calculations further.

'in H2 as an array formula with CSE
=IF($G2=$G1, H1, 
    IFERROR(INDEX(INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A))), 1), 0)),
    IFERROR(INDEX(INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A))), 1), 0)),
    IFERROR(INDEX(INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A))), 1), 0)),
    TEXT(,)))))

Addendum³

There was a problem with single city entries and this amendment takes care of that.

You may receive a Circular Reference warning with this. Technically, the warning is true but you wil never actually get a circular reference due to the nested IF structure. In other words, the circular reference is only valid when an IF is false. Excel reports this because it does not examine the conditions; only the possibility of a circular reference which technically is true but can never actually occur.

'in H2 as an array formula with CSE
=IF($G2=$G1, H1, IF(COUNTIF($A:$A, $G2)=1,
    IFERROR(INDEX(INDEX($B:$D, MATCH($G2, $A:$A, 0), 0), MATCH(0, IF(INDEX($B:$D, MATCH($G2, $A:$A, 0), 0)<>"", COUNTIF($G2:G2, INDEX($B:$D, MATCH($G2, $A:$A, 0), 0)), 1), 0)), TEXT(,)),
    IFERROR(INDEX(INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A))), 1), 0)),
    IFERROR(INDEX(INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A))), 1), 0)),
    IFERROR(INDEX(INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A))), 1), 0)),
    TEXT(,))))))

¹ If your Excel version does not support the TEXTJOIN function, search this site for [excel][textjoin] to find alternatives in ragged-length, delimited string concatenation.

Community
  • 1
  • 1
  • This works phenomenally as well, thank you so much for the solution and all the additional commentary and explanation @Jeeped! This formula is like poetry, and, as someone with mild OCD with formulas, the symmetry is very pleasing. I can't thank you and @scottcraner enough. Seriously, I've searched probably 300 web pages and examples to try to figure this out. I'm going to visit all the relevant sites I bookmarked and reply with these answers! Many will benefit! – sociologix Mar 31 '17 at 03:15
  • Thanks again @Jeeped. I'm working with the final formula you assembled (it's lightning fast vs. the first) for my large dataset. It appears to work for nearly all the instances with the exception of a few which don't seem to catch the value in B:B if there is a match (see row 7 'Regina' [here](http://imgur.com/w9Kf2WB)). Any ideas what could be causing this error? – sociologix Mar 31 '17 at 17:53
  • Hmmm... you are correct that single city entries cause problems even in the sorted data. A nested IF could take care of that but it ulgies the formula. I'll do some more investigation. –  Mar 31 '17 at 19:47
  • @sociologix - I think I covered the single city condition above. –  Mar 31 '17 at 20:34
  • Further to the efficiency of this formula. I adapted it to search for uniques across 5 columns (instead of the three in the example) and over 80,000 rows of data. Completed in roughly 5-10 minutes, which is pretty fantastic (i7-2600K CPU @ 3.40GHz; 16GB RAM; 64 bit WIN 10; Excel 2016) – sociologix Mar 31 '17 at 21:15
  • Thank you for the timer data. I was curious about that. Now that it is working, I'll bet a trip to [Code Review - Excel](http://codereview.stackexchange.com/questions/tagged/excel) would produce a parallel solution with 2-D variant arrays that would take seconds instead of minutes. –  Mar 31 '17 at 21:19
  • Hello @Jeeped, I'm adapting this formula for a situation where the first column that gets filled in is J (formula placed in J2 instead of H2). Is there a reason the Adendum3 formula only grabs the first match when I change H1 to J1 (in the first line), for example? I'm assuming this formula is set up in a way where the first results column must be directly to the right of the primary match column (i.e.G)? – sociologix Apr 01 '17 at 22:58
1

Try this UDF

Function UniquePh(Lookupvalue As String, LookupRange As Range, ValueRng As Range) As String
Dim dict As Object
Dim lkpArr() As Variant
Dim ValArr() As Variant
Set LookupRange = Intersect(LookupRange, LookupRange.Parent.UsedRange)
Set ValueRng = Intersect(ValueRng, ValueRng.Parent.UsedRange)
If LookupRange.Rows.Count <> ValueRng.Rows.Count Or LookupRange.Columns.Count > 1 Then Exit Function

Set dict = CreateObject("Scripting.Dictionary")
lkpArr = LookupRange.Value
ValArr = ValueRng.Value

For i = LBound(lkpArr, 1) To UBound(lkpArr, 1)
    If lkpArr(i, 1) = Lookupvalue Then
        For j = LBound(ValArr, 2) To UBound(ValArr, 2)
            If ValArr(i, j) <> "" Then
                On Error Resume Next
                    dict.Add ValArr(i, j), ValArr(i, j)
                On Error GoTo 0
            End If
        Next j
    End If
Next i

For Each itm In dict
    UniquePh = UniquePh & itm & ", "
Next itm
If Len(UniquePh) > 0 Then
    UniquePh = Left(UniquePh, Len(UniquePh) - 2)
Else: UniquePh = ""
End If

End Function

Put it in a modules attached to the workbook. Then call it from the sheet. Put this formula in H2:

=UniquePh(G2,$A$2:$A$10,$B$2:$D$10)

enter image description here

It should run a lot faster.

Scott Craner
  • 148,073
  • 10
  • 49
  • 81
  • 1
    I've found that using `set ValueRng = intersect(ValueRng, ValueRng.parent.usedrange)` allows full column references without penalty. The same would have to be done to `LookupRange`. Also see my addendum on reducing the repetitious calculation with an IF on the city name. –  Mar 31 '17 at 02:07
  • I will look into that. @Jeeped. That makes sense. – Scott Craner Mar 31 '17 at 02:20
  • This works brilliantly, thanks so much@ScottCraner! In my 10k file, there are some city names in column A that do not have adjacent values in columns B through D. I am getting #VALUE errors in the results column H for these instances. So, for example, if A11 had Z as value and B11,C11,D11 are blank, is there a way to amend the VBA with some sort of qualifier that skips over instances where B,C,D are empty? Could this also speed up the calculations? – sociologix Mar 31 '17 at 02:55
  • @sociologix see edit, I usually just wrap the formula in IFERROR to deal with it leaving the UDF to return errors so I know something is wrong. But I added the error check to the UDF – Scott Craner Mar 31 '17 at 14:43
  • ScottCraner & @jeeped, I'm now working with ~1 million rows (still the same number of columns). Do you happen to know if there is any way to tune up this VBA to handle that amount of data? – sociologix Aug 19 '17 at 18:09