1

I want to remove rows in df1 which have common numbers in the "0" columns of both df's

df1 = pd.read_csv('Displacement.txt', sep="\s", header=None)
df2= pd.read_csv('overconstraint.txt', header=None)

df1 (>6million rows) looks like

    0   1   2   3   4   5   6
0   1   ,   1   ,   1.0 ,   1.8800E-01
1   2   ,   1   ,   1.0 ,   1.8491E-01

df2 (>1000 rows) looks like

    0
0   1119933
1   1119933
2   1122062

How to do this?

shoggananna
  • 545
  • 5
  • 9

2 Answers2

1

You need specify column instead DataFrame by Series.isin:

test=df1[~df1[0].isin(df2[0])]

#in some pandas versions is necessary convert Series to list
test=df1[~df1[0].isin(df2[0].astype(int).tolist())]

Solution with casting to integers:

test=df1[~df1[0].astype(int).isin(df2[0].astype(int))]

#in some pandas versions is necessary convert Series to list
test=df1[~df1[0].astype(int).isin(df2[0].astype(int).tolist())]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

Make the difference between ref columns:

>>> df1[~df1['ref'].isin(df2['ref'])]
   ref  value1
0    2      76
1    4      23
5   12      78
6   14      34
Corralien
  • 109,409
  • 8
  • 28
  • 52