Keep array rows where a column value is found in a second flat array

Question

** I have edited this to show how I got my code to work using array_search

I have an array, $arr1 with 5 columns as such:

 key    id  name    style   age whim
 0      14  bob     big     33  no
 1      72  jill    big     22  yes
 2      39  sue     yes     111 yes
 3      994 lucy    small   23  no
 4      15  sis     med     24  no
 5      16  maj     med     87  yes
 6      879 Ike     larg    56  no
 7      286 Jed     big     23  yes

This array is in a cache, not a database.

I then have a second array with a list of id values -

$arr2 = array(0=>14, 1=>72, 2=>8790)

How do I filter $arr1 so it returns only the rows with the id values in $arr2?

I got my code to work as follows:

$arr1 = new CachedStuff();  // get cache

$resultingArray = [];  // create an empty array to hold rows
$filter_function = function ($row) use ($arr2) {
    return (array_search($row['id'], $arr2));
};
$resultingArrayIDs = $arr1->GetIds($filter_function, $resultingArray);

This gives me two outputs: $resultingArray & $resultingArrayIDs both of which represent the intersection of the $arr1 and $arr2.

mickmackusa · Accepted Answer · 2023-03-23T07:51:51.960

3

This whole task can be accomplished with just one slick, native function call -- array_uintersect().

Because the two compared parameters in the custom callback may come either input array, try to access from the id column and if there isn't one declared, then fallback to the parameter's value.

Under the hood, this function performs sorting while evaluating as a means to improve execution time / processing speed. I expect this approach to outperform iterated calls of in_array() purely from a point of minimized function calls.

Code: (Demo)

var_export(
    array_uintersect(
        $arr1,
        $arr2,
        fn($a, $b) =>
            ($a['id'] ?? $a)
            <=>
            ($b['id'] ?? $b)
    )
);

edited Mar 23 '23 at 07:51

answered Nov 27 '17 at 04:41

mickmackusa

43,625
12
83
136

1

@ian I am a late poster on this question, so I didn't earn any of the early upvote handouts. I would like to clarify, in case my answer doesn't explain well enough, that my answer is efficient because it doesn't perform array scans in a loop. DecentDabbler's and Erwin's solutions perform partial (potentially full array scans) on each iteration of their loop and that can be costly. user2182349 was on the right track, but performed the filtration process on temporary values instead of temporary keys and thus had to call a `foreach()` loop to generate the desired output array. – mickmackusa Nov 27 '17 at 07:17
One final note is that if you don't care about the temporary keys, you can skip the final call of `array_values()` and move on to the next process in your code. – mickmackusa Nov 27 '17 at 07:18
@ian and mickmackusa: yes, this answer is considerably faster than my solution, especially when dealing with a large cache set and a large filter set. – Decent Dabbler Nov 27 '17 at 22:02
@mickmackusa and Decent Dabbler: Thanks to you both. I have edited my code above to show how I used array_search to fix my problem. BUT, I am interested if this method can be applied to make it more efficient. Can either of you tell by looking at my updated function? – ian Nov 28 '17 at 01:24
@ian the way you are using `array_search()` is not "zero-safe" because your condition is using a loose comparison on the result of the search, a (true) `0` return value will be incorrectly evaluated as false. You need to make a literal/strict check on `false` to use this function call properly. Furthermore, I am again clarifying that using an iterated `array_search()` or `in_array()` method is not as efficient as my method that does a single comparison on the two arrays. Please use my method for best efficiency. – mickmackusa Nov 28 '17 at 01:46
@ian if you want to have a separate array of just the intersecting keys, then you can/should remove the `array_values()` call from my method, and just call `array_keys()` on my result array and simply declare that result to a new variable. My answer truly is the right way to go about this. – mickmackusa Nov 28 '17 at 01:47
Thanks @mickmackusa. I believe you and I do have a large cache so am looking for efficient solutions. It's just I am fairly new to arrays so still trying to piece stuff together. – ian Nov 28 '17 at 01:59
@ian I am happy to explain as much as I can. When SO is treated like a classroom, everyone wins! Here is a demo that shows how your current method will yield unexpected results. http://sandbox.onlinephpfunctions.com/code/ec343b847710d511f444ba2d085bc41bcf0a83a6 – mickmackusa Nov 28 '17 at 02:00
@mickmackusa that's interesting. It does that because it cannot identify the 0 index? – ian Nov 28 '17 at 02:05
@ian here is a slightly modified version of my answer to create the two arrays that you desire: https://3v4l.org/4hR0C – mickmackusa Nov 28 '17 at 02:07
@ian There is a pink Warning box in the manual on this point: http://php.net/manual/en/function.array-search.php You MUST explicitly compare using `!==false` so that a returned key of zero is not mistreated ("type-juggled" by php). This same requirement exists elsewhere in php, of the top of my head `strpos()` needs this same treatment. – mickmackusa Nov 28 '17 at 02:11
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/159949/discussion-between-ian-and-mickmackusa). – ian Nov 28 '17 at 02:12
@mickmackusa In the example above, I flipped $arr1 and swapped out `return (array_search($row['id'], $arr2));` with `return array_intersect_key(array_column($arr1,null,'id'),$arr2);` but that didn't get it. Is my existing code salvageable to use yours? Any idea how I would apply your intersection to the existing function? – ian Nov 28 '17 at 02:54
I have to say I am less comfortable with your function assigned to a variable -- I simply don't code with that style. It's not to say that it is wrong, I just find it harder to follow. I recommend the style that I have provided in my latest demo. I think future eyes on your code will find it simple to follow too. – mickmackusa Nov 28 '17 at 02:58
5 years later ... I have completely rewritten my answer with a more direct and elegant approach. – mickmackusa Aug 25 '22 at 10:34
@mickmackusa I notice that my freshly migrated answer is a duplicate of what you "completely rewrote" over last year. ^_^ Just as well, we have a broader spectrum of possibilities back in the view again. I'm wondering if there's any sort of penalty to "massive null-coalescence", which is a clever way to handle the intersection of two asymmetric arrays. – Markus AO Mar 19 '23 at 17:47
@Markus Well, making modern benchmarks and comparing potential techniques is probably the final frontier on Stack Overflow. Because there isn't much that hasn't been covered by this Q&A over more than a decade, we can only hope to sharpen the existing advice. I think your diligent posting style caters to this endeavor. Sometimes performance is the most important concern for researchers. I often strive for directness, lowest time complexity, and elegance (which can be at odds at times). Specific to this task, data distribution might make one time complexity algorithm better than another. – mickmackusa Mar 19 '23 at 22:04

Decent Dabbler · Answer 2 · 2017-11-27T04:19:46.207

Something like this should do it, provided I've understood your question and data structure correctly:

$dataArray = [
  [ 'key' => 0, 'id' => 14  , 'name' => 'bob'  , 'style' => 'big'   , 'age' => 33  , 'whim' => 'no'  ],
  [ 'key' => 1, 'id' => 72  , 'name' => 'jill' , 'style' => 'big'   , 'age' => 22  , 'whim' => 'yes' ],
  [ 'key' => 2, 'id' => 39  , 'name' => 'sue'  , 'style' => 'yes'   , 'age' => 111 , 'whim' => 'yes' ],
  [ 'key' => 3, 'id' => 994 , 'name' => 'lucy' , 'style' => 'small' , 'age' => 23  , 'whim' => 'no'  ],
  [ 'key' => 4, 'id' => 15  , 'name' => 'sis'  , 'style' => 'med'   , 'age' => 24  , 'whim' => 'no'  ],
  [ 'key' => 5, 'id' => 16  , 'name' => 'maj'  , 'style' => 'med'   , 'age' => 87  , 'whim' => 'yes' ],
  [ 'key' => 6, 'id' => 879 , 'name' => 'Ike'  , 'style' => 'larg'  , 'age' => 56  , 'whim' => 'no'  ],
  [ 'key' => 7, 'id' => 286 , 'name' => 'Jed'  , 'style' => 'big'   , 'age' => 23  , 'whim' => 'yes' ]
];

$filterArray = [14, 72, 879];
$resultArray = array_filter( $dataArray, function( $row ) use ( $filterArray ) {
  return in_array( $row[ 'id' ], $filterArray );
} );

^{View this example on eval.in}

However, your question appears to suggest this data might be coming from a database; is that correct? If so, perhaps it's more efficient to pre-filter the results at the database-level. Either by adding a field in the SELECT query, that represents a boolean value whether a row matched your filter ids, or by simply not returning the other rows at all.

Thanks @Decent Dabbler, it is actually coming from a cache. – ian Nov 27 '17 at 13:30 — ian, Nov 27 '17 at 13:30

score 1 · Answer 3 · answered Nov 27 '17 at 03:26

One way is with foreach loop with array_search()

$result = [];
foreach ($arr1 as $value) {                            // Loop thru $arr1
    if (array_search($value['id'], $arr2) !== false) { // Check if id is in $arr2
        $result[] = $value;                            // Push to result if true
    }
}

// print result
print_r($result);

score 1 · Answer 4 · answered Nov 27 '17 at 03:38

As @DecentDabbler mentioned - if the data is coming out of a database, using an IN on your WHERE will allow you to retrieve only the relevant data.

Another way to filter is to use array functions

array_column extracts the value of the id column into an array
array_intersect returns the elements which are in both $arr1['id'] and $arr2

array_flip flips the resulting array such that the indices into $arr1 indicate the elements in both $arr1 and $arr2

$arr1 = [ [ 'id' => 14, 'name' => 'bob'],
        ['id' =>  72, 'name' => 'jill'],
        ['id' =>  39, 'name' => 'sue'],
        ['id' => 994, 'name' => 'lucy'],
        ['id' => 879, 'name'=> 'large']];

$arr2 = [ 14,72,879 ];

$intersection = array_flip(array_intersect(array_column($arr1,'id'),$arr2));

foreach ($intersection as $i) {
        var_dump($arr1[$i]);;
}

Markus AO · Answer 5 · 2023-03-19T17:39:34.560

This answer was migrated from a deleted duplicate. Revised to make sense independent of context.

Assume the following sample data (named $items and $select instead of $arr1 and $arr2 for clarity):

// Source data: A multidimensional array with named keys
$items = [
    ['id' => 1, 'name' => 'Foo'],
    ['id' => 3, 'name' => 'Bar'],
    ['id' => 5, 'name' => 'Maz'],
    ['id' => 6, 'name' => 'Wut'],
];

// Filter values: A flat array of scalar values
$select = [1, 5, 6];

Then, how do we extract $items with an id that matches one of the values in $select? And further, how do we do that in a manner that scales gracefully for larger datasets? Let's look at the possibilities and compare their weights.

1. Optimizing array_filter():

The answer using array_filter certainly gets the job done. However, there's an in_array function call made at each iteration. With small datasets, this is hardly an issue. With larger datasets, repeated function calls in an iteration can result in a significant performance hit. Then, for large loops, where possible it's good to "preprocess" data for a lighter operation that uses language constructs in place of the more expensive function calls.

How to avoid in_array() in loops?

You can "enable" simple index lookups with array_flip($select), ie. by swapping keys and values, and then using isset (language construct, not a function!): isset($select[$id]). This performs significantly better than repetitions of in_array($id, $select) for larger datasets; not only for lack of function call, but at each iteration, in_array scans over the $select array for matches (over and over). Optimized as follows:

$select = array_flip($select);
$selected_items = array_filter($items, function($item) use ($select) {
    return isset($select[$item['id']]);
});

Or using an arrow function that includes the parent scope, ie. doesn't need the use statement:

$select = array_flip($select);
$selected_items = array_filter($items, fn($item) => isset($select[$item['id']]));

2. Using Key Intersection

One elegant alternative to filtering is key intersection. First, we re-index the array by the desired lookup key using array_column(), with null for column key (returns full array instead of a specific column), and with id for the new index key:

$items_by_id = array_column($items, null, 'id');

This gives you the same source array, but instead of being zero-indexed, it now uses the id column's value for the index key. Then, we're an array_intersect_key away from extracting the selection from the source array:

$selected_items = array_intersect_key($items_by_id, array_flip($select));

Here we flip the $select to intersect keys. Note that array_intersect_key performs better than approaches using array_intersect. (Keys are simple!) Result as expected. See demo of this approach. Finally, here's a one-liner (formatted for easy reading) without the throw-away variable:

$selected_items = array_intersect_key(
    array_column($items, null, 'id'), 
    array_flip($select)
);

N.B. The resulting array will retain the actual id of the item for its index key; instead of the default zero-indexed keys. Keep that in mind if you cross-reference the selected items with your source array later on in your code; and perhaps index items by the proper ID from the beginning.

Comparing these approaches:

array_filter() incurs 1 iteration of $items with 1 (anonymous) function call per each array member; and then as many iterations of $select as there are items, if in_array is used to compare the current item's ID with each $select member. (Use key lookups instead.)

The answer using array_search in a foreach loop suffers from the same weight, resulting in count($items) times function calls --- and a whole lot of redundant rounds over the selection/filter array.

The array_key_intersect method 1. iterates over $items once (simple reindexing); 2. iterates over $select once (key/value flip); and 3. iterates over the keys of each for an intersection. array_intersect_key sorts both lists and then compares them in parallel, and as such is much more efficient than repeated array scans for each value. (This function exists specifically for getting intersections, ie. finding overlaps, after all.)

3. Good Old Foreach Loop

Of course a good old foreach loop will also work perfectly fine. Again, using array_flip() and isset() index lookups, rather than in_array() or array_search(). As follows:

$select = array_flip($select);

$selected_items = [];
foreach($items as $key => $val) {
    if (isset($select[$val['id']])) {
        $selected_items[] = $items[$key];
    }
}

I'd instinctively use this for large datasets (or long comparison lists) where "bare bones" performance is called for, going by "simpler is better". However, you likely won't see a big difference between this and the key intersection approach without massive data to process. (If someone has compared these methods for PHP 8.x, please share the benchmark results.)

@mickmackusa this answer has been migrated (and revised/enlarged) from the deleted duplicate. — Markus AO, Mar 19 '23 at 17:40
Just so you know, I didn't get pinged by the above comment -- I must be "present" on the actual/individual post to qualify for the ping. — mickmackusa, Mar 19 '23 at 22:05

Keep array rows where a column value is found in a second flat array

** I have edited this to show how I got my code to work using array_search

5 Answers5

Linked

Related