2

I have two dataframes. Here is dwpjp.head():

jp_number
0 25146315052147720191
1 57225427599900052634
2 86076681691411639833
3 50491824499499656478
4 95588382889227620465

and ct_data.head():

imjp_number imct_id
0 23605308039805192764 x1E5e3ukRyEFRT6SUAF6lg|d543d3d064da465b8576d87
1 57225427599900052634 aa0d2dac654d4154bf7c09f73faeaf62|-vf6738ee3bed
2 53733358271401869469 6FfHZRoiWs2VO02Pruk07A|__g3d877adf9d154637be26
3 50491824499499656478 __gbe204670ca784a01b7207b42a7e5a5d3|54e2c39cd3
4 82143248133286027306 __g1114a30c6ea548a2a83d5a51718ff0fd|773840905c

I want two new dataframes cct_data, and dct_data from ct_data. The ct_data dataframe should be split on the condition if the jp_number is present in the dwbjp dataframe then put into cct_data, otherwise put into dct_data.

I tried this for common jp_number present in dwpjp:

cct_data = ct_data[ct_data.isin(dwpjp).any(1).values]

and for the other I negated the condition as follows:

dct_data = ct_data[~[ct_data.isin(dwpjp).any(1).values]]

but results are not getting as below.

cct_data

imjp_number imct_id
0 57225427599900052634 aa0d2dac654d4154bf7c09f73faeaf62|-vf6738ee3bed
1 50491824499499656478 __gbe204670ca784a01b7207b42a7e5a5d3|54e2c39cd3

and dct_data:

imjp_number imct_id
0 23605308039805192764 x1E5e3ukRyEFRT6SUAF6lg|d543d3d064da465b8576d87
1 53733358271401869469 6FfHZRoiWs2VO02Pruk07A|__g3d877adf9d154637be26
2 82143248133286027306 __g1114a30c6ea548a2a83d5a51718ff0fd|773840905c

Note: jpnumber=imjp_number.

Shaido
  • 27,497
  • 23
  • 70
  • 73
DKBOSS
  • 113
  • 1
  • 7

1 Answers1

0

Modified your formula as below

cct_data = ct_data[ct_data.imjp_number.isin(dwpjp.jp_number)]

and

dct_data = ct_data[~ct_data.imjp_number.isin(dwpjp.jp_number)]
Deven Ramani
  • 751
  • 4
  • 10
  • This method taking too much time as ct_data have 50 records and dwpjp have 3.5 M records, any faster way to achieve it? – DKBOSS Mar 05 '21 at 06:20
  • 1
    It can be calculated fast by `cct_data = pd.merge(dwpjp, ct_data, on='jp_number', how='inner')` with assuming that both data frame having **jp_number** as commonly title columns name – Deven Ramani Mar 05 '21 at 06:55
  • Ramanai I got TypeError: unhashable type: 'list' error – DKBOSS Mar 05 '21 at 07:29