In Python pandas
, I can easily drop duplicates in a DataFrame with:
df1.drop_duplicates(['Service Date', 'Customer Number'], inplace=True)
Is there anything in C# or Deedle
that's this simple and fast? Or do I need to iterate over the entire frame (from a large CSV file) to drop duplicates?
The data I'm working with is imported from a large CSV file with about 40 columns and 12k rows. For each date, there are multiple entries for Customer Number. I need to eliminate duplicate Customer Number rows (leaving only one unique) per date.
Here's some simplified data, using DATE and RECN as the columns used to de-dupify:
NAME, TYPE, DATE, RECN, COMM
Kermit, Frog, 06/30/14, 1, 1test
Kermit, Frog, 06/30/14, 1, 2test
Ms. Piggy, Pig, 07/01/14, 2, 1test
Fozzy, Bear, 06/29/14, 3, 1test
Kermit, Frog, 07/02/14, 1, 3test
Kermit, Frog, 07/02/14, 1, 4test
Kermit, Frog, 07/02/14, 1, 5test
Ms. Piggy, Pig, 07/02/14, 2, 3test
Fozzy, Bear, 07/02/14, 3, 2test
Ms. Piggy, Pig, 07/02/14, 2, 2test