I was wondering whether there is a way to compare each element (regardless of indexical position) in a numpy array. I often find myself using arrays from pandas dataframes and I'd like to use the underlying numpy array to do compare each element. I know I can do a fast-elementwise comparison like this:
dfarr1 = pd.DataFrame(np.arange(0,1000))
dfarr2 = pd.DataFrame(np.arange(1000,0,-1))
dfarr1.loc[(dfarr1.values == dfarr2.values)]
# outputs: 500
(the above is just a toy example, obviously) But what I'd like to do is rather the equivalent of two loops over all the elements, but in a way that is as fast as possible:
for ir in df.itertuples():
for ir2 in country_df.itertuples():
if df['city'][ir[0]] == country_df['Capital'][ir2[0]]:
df['country'][ir[0]] = country_df['Country'][ir2[0]]
The thing is that my dataframes contains many thousands of elements and the above is simply too slow (not least given that I'm sure I'll do similar such operations in the future on different, similarly long dataframes and so clearing this once and for all would be good). The idea is that I've parsed a few thousand files and got their geodata (=df in the above) and I have a quite massive file with cities and their corresponding countries as a lookup (=country_df). The idea is to see if the cities in the df match those in the lookup and if so I'd like to add the corresponding country in a new column (at the same row index) of the df with the parsed geodata. Anyway, this is just an example of what I'd need at (ideally much) higher speed than the above way. Many thanks!