I have 100 large arrays > 250,000 elements each. I want to find common values that are found in these arrays. I know that there are not going to be values that are found in all 100 arrays, but a small number values will be found in multiple arrays (I suspect 10-30%). I want to find which values are found with the highest frequency across these arrays. (Side point: arrays have no duplicates)
I know that I can loop through the arrays and eventually find them, but that will take a while. I also know about the np.intersect1d
function, but I that only gives values that are found within all of the arrays, whereas I'm looking for values that are only going to be in around 20 of the 100 arrays.
My best bet is use the np.intersect1d
function and loop through all possible combinations of the arrays, which would definitely take a while, but not as long as simply looping through all 250,000 x 100 values.
Example:
array_1 = array([1.98,2.33,3.44,,...11.1)
array_2 = array([1.26,1.49,4.14,,...9.0)
array_2 = array([1.58,2.33,3.44,,...19.1)
array_3 = array([4.18,2.03,3.74,,...12.1)
.
.
.
array_100= array([1.11,2.13,1.74,,...1.1)
No values in all 100, Is there a value that can be found in 30 different arrays?