0

I am trying to add a count of all the matches between dataframes a & b

df2['Count'] = len(set(a) & set(b))
df2.head(5)

But it only returns "0"

Data for a:

Result Column1 Column2 Column3 D-level R-level
numpy de LA 11060303 8 NaN
FRA Paris YouTube 56764332 1 4.0

Here is the data for b:

numpy
File Edit View —Insert_ © Cell~—«KKemmel_«- Widgets. Help tT | Python 3 (ipykernel) @
@ B & % mR MC PM Code » 2

YouTube

import numpy

Desired output should be a total of matches between a and b appended to the dataframe:

Result Column1 Column2 Column3 D-level R-level No of matches?
numpy de LA 11060303 8 NaN (1 unique match)
FRA Paris YouTube 56764332 1 4.0 (1 unique match)

Best,

William_b
  • 9
  • 3

1 Answers1

0

lets consider the dataframe

df = pd.DataFrame([['a','c'],['a','b']])

Running set(df) results in {0,1} which is not the set of entries you want. What you need to do is get a flattened list of entries (see How to make a flat list out of a list of lists?)

def flatten_df_values(df):
    return [item for sublist in df.values for item in sublist]

then if you have a second dataframe

df2 = pd.DataFrame([['f','c'],['a','b']])

you can perform your operation and get

set(flatten_df_values(df)) & set(flatten_df_values(df2)) = {'a', 'b', 'c'}

if you want to get the repeated rows you can simply use merge with its default how='inner'

df.merge(df2,on=list(df.columns))  

This will result in a Dataframe containing the duplicated rows. In our example case

   0  1
0  a  b

Note that you can modify the on parameter to include only the columns you want.

Arnau
  • 741
  • 1
  • 4
  • 8