0

long time reader, first time pos(t)er of questions.

I have an Excel 2013 worksheet of about 4,000 unique records (rows) of data. We'll call this the data dump. I've filtered the data dump using any one of about six different data elements (columns). After each filter I save the results to a new worksheet. I clear the filter to start over, and ultimately wound up with about six different worksheets.

I need to be able to account for each unique record in the data dump--each one should (in theory) appear on at least one of the filtered worksheets, and I need to identify any that don't.

My big problem is that the only way to uniquely identify each record is by concatenating a text string out of five consecutive cells in each row. I cannot add a column of concatenated text to these worksheets (for which reasons I'll presently spare you), so essentially I'm trying to build a formula that says the following:

For a given, unique, concatenated string of text of five consecutive cells from one record on this data dump worksheet, identify any exact matching strings from any of the other worksheets and return TRUE if found or FALSE if not.

I will, of course, have to apply this formula to every record in the data dump.

Thoughts or tips? Ultimately I think it comes down to a lot of small moving parts that I could manage individually, but that I'm not confident I could manage collectively.

Any help is appreciated and I'll be happy to clarify where needed. And forgiveness if a similar question has been asked previously--I searched pretty fruitlessly for an answer all afternoon.

Thank you!

2 Answers2

0

You could use Index to create a concatenated range that serves a lookup range to Match(). Match() can concatenate the lookup term. It then returns a number for a match or an error if no match is found. Wrap error trapping formulas around this for the TRUE/FALSE result. Along the lines of

=iferror(match(sheet1!A1&sheet1!B1&sheet1!C1&sheet1!d1&sheet1!e1,index(Sheet2!$a$1:$a$1000&Sheet2!$b$1:$b$1000&Sheet2!$c$1:$c$1000&Sheet2!$d$1:$d$1000&Sheet2!$e$1:$e$1000,0),0),FALSE)

Note that any match will return a number (which will evaluate to a boolean TRUE in summarising formulas) and a non-match will return a FALSE.

This will get you the row number of the match for the first row of original data on sheet1, where the first extract lives on Sheet2 in the first 1000 rows. Use the same principle for the other four sheets and wrap the five formulas into an OR() statement to arrive at a final TRUE or FALSE.

Note that the Index ranges should not encompass whole columns, but only the rows with data. Otherwise the formula will be very slow to recalculate, especially if you use it 4000 times.

teylyn
  • 34,374
  • 4
  • 53
  • 73
  • Oh ooops, I didn't see your answer when I was typing out. Looks like you got there first. – MacroMarc Oct 09 '15 at 00:24
  • Thank you kindly, teylyn! This is extremely helpful. Just to be sure I understand one point: "Index ranges should not encompass whole columns, but only rows with data." In other words, if one worksheet has 45 records and another has 230, my two index ranges will be 45 and 230 respectively. Correct? Edit: and thank you @MacroMarc for the feedback as well. The hard returns are really helpful for parsing this bad boy out. Curiously, that's not too far from the first formula I tried to write. Not sure where I screwed up, but hey--gotta start somewhere. – Messy Jesse Oct 09 '15 at 01:31
  • Yes, that is correct. If you use whole columns, the formula will take forever to calculate. You can speed up calculation by specifying only the few hundred (or thousand) rows with data instead of the more than a million rows in a whole column. – teylyn Oct 09 '15 at 08:41
  • Both suggestions have done exactly what I've needed to do. Last question: if I were now interested in finding out where the first matching string exists on the other worksheets, is there a quick and dirty way to do that (without using VLOOKUP)? I'm wondering, for example, if I can make use of whatever number was returned that was translated into the boolean TRUE. – Messy Jesse Oct 09 '15 at 14:09
  • you can adjust the formula I posted above to return the spreadsheet name. If the MATCH is a number, return a text string like "sheet1". Try along the lines of `=if(,"found on MySheetOne","")` – teylyn Oct 09 '15 at 15:54
0

Here is one way. If you have your datadump records from A1 downwards. And assuming you can have your filter sheets similarly. Then adjust your filter ranges so that the formula calls the fixed ranges properly.
You might be able to name them...
This formula need CSE for it to work

Edit by teylyn: This formula is an array formula and needs to be confirmed with Ctrl-Shift-Enter. It will not work if you only hit the Enter key after editing the formula.

Control-Shift-Enter is sometimes referred to as CSE. People also call it "array-entering" a formula. Excel will put curly braces around the formula, which you can see in the formula bar when the cell is selected.

=OR(
(IFERROR(MATCH(A1&B1&C1&D1&E1,FilterSheet1!$A$1:$A$200&FilterSheet1!$B$1:$B$200&FilterSheet1!$C$1:$C$200&FilterSheet1!$D$1:$D$200&FilterSheet1!$E$1:$E$200, 0), FALSE), 
(IFERROR(MATCH(A1&B1&C1&D1&E1,FilterSheet2!$A$1:$A$200&FilterSheet2!$B$1:$B$200&FilterSheet2!$C$1:$C$200&FilterSheet2!$D$1:$D$200&FilterSheet2!$E$1:$E$200, 0), FALSE), 
(IFERROR(MATCH(A1&B1&C1&D1&E1,FilterSheet3!$A$1:$A$200&FilterSheet3!$B$1:$B$200&FilterSheet3!$C$1:$C$200&FilterSheet3!$D$1:$D$200&FilterSheet3!$E$1:$E$200, 0), FALSE), 
(IFERROR(MATCH(A1&B1&C1&D1&E1,FilterSheet4!$A$1:$A$200&FilterSheet4!$B$1:$B$200&FilterSheet4!$C$1:$C$200&FilterSheet4!$D$1:$D$200&FilterSheet4!$E$1:$E$200, 0), FALSE), 
(IFERROR(MATCH(A1&B1&C1&D1&E1,FilterSheet5!$A$1:$A$200&FilterSheet5!$B$1:$B$200&FilterSheet5!$C$1:$C$200&FilterSheet5!$D$1:$D$200&FilterSheet5!$E$1:$E$200, 0), FALSE))

I have put hard returns so you can see what is going on better. Obviously you must collect up the formula

EDIT for new requirement: Ctrl+Shift+Enter required again

=CONCATENATE(
(IFERROR(MATCH(A1&B1&C1&D1&E1,FilterSheet1!$A$1:$A$200&FilterSheet1!$B$1:$B$200&FilterSheet1!$C$1:$C$200&FilterSheet1!$D$1:$D$200&FilterSheet1!$E$1:$E$200, 0) & " - FilterSheet1", ""), 
(IFERROR(MATCH(A1&B1&C1&D1&E1,FilterSheet2!$A$1:$A$200&FilterSheet2!$B$1:$B$200&FilterSheet2!$C$1:$C$200&FilterSheet2!$D$1:$D$200&FilterSheet2!$E$1:$E$200, 0) & " - FilterSheet2", ""), 
(IFERROR(MATCH(A1&B1&C1&D1&E1,FilterSheet3!$A$1:$A$200&FilterSheet3!$B$1:$B$200&FilterSheet3!$C$1:$C$200&FilterSheet3!$D$1:$D$200&FilterSheet3!$E$1:$E$200, 0) & " - FilterSheet3", ""), 
(IFERROR(MATCH(A1&B1&C1&D1&E1,FilterSheet4!$A$1:$A$200&FilterSheet4!$B$1:$B$200&FilterSheet4!$C$1:$C$200&FilterSheet4!$D$1:$D$200&FilterSheet4!$E$1:$E$200, 0) & " - FilterSheet4", ""), 
(IFERROR(MATCH(A1&B1&C1&D1&E1,FilterSheet5!$A$1:$A$200&FilterSheet5!$B$1:$B$200&FilterSheet5!$C$1:$C$200&FilterSheet5!$D$1:$D$200&FilterSheet5!$E$1:$E$200, 0) & " - FilterSheet5", ""))

My edit for the new requirement just takes the matches found, as @Messy Jesse said, and also appends the sheet name too. If no match is found in a sheet, then ZLS is added to the string. The total string is then concatenated...

MacroMarc
  • 3,214
  • 2
  • 11
  • 20
  • Did you get that to work? The lookup range for MATCH must be constructed with Index, as I showed above, or the formula must be array entered. Just thought that would be worth a mention. – teylyn Oct 09 '15 at 08:39
  • So, are YOU going to edit your post with instructions about CSE or shall I? – teylyn Oct 09 '15 at 10:20
  • I added some clarification. Not everybody is born with the details of what 'CSE' means. – teylyn Oct 09 '15 at 10:57