Data-frame below and I want to drop the consecutive duplicated rows, when 'People', 'Year' and 'Project' are the same.
If the original data-frame like below, rows with the same 'People','Year','Project' when consecutive, are to be removed.
data = {'People' : ["David","David","David","David","John","John","John"],
'Year': ["2016","2016","2017","2016","2016","2017","2017",],
'Project' : ["TN","TN","TN","TN","DJ","DM","DM"],
'Earning' : [878,682,767,620,964,610,772]}
I tried this but it doesn't work:
df_1 = df.loc[(df['People', 'Year', 'Project'].shift() != df['People', 'Year', 'Project'])]
attempt - this line removes the non-consecutive "David, 2016, TN, 620"
df_1 = df.drop_duplicates(subset=['People','Year','Project'])
when changed to this, it keeps all the rows:
df_1 = df.drop_duplicates(subset=['People','Year','Project', 'Earning'])
What's the right way to do it? Thank you!