Here, is the data set which I'm working on Which looks like this.
Basically, I want to delete duplicate rows specifically I know the drop_duplicate
command but I need some help.
Let me show you by sorting the data so that It'll give you a clear understanding.
by_streamed=data.sort_values(by='Streams',ascending=False)
by_streamed
So when I get the top 10 streamed songs the duplicates obviously interfere. If you look closely though the ranks of these songs are different
I want to remove these type of duplicate rows. Here's my code,
data=data.drop_duplicates(subset=['Artist','Title'],keep='first')
But this removes a lot of rows that weren't supposed to be.
There is indeed an issue with subset
but I can't interpret it. It would be great if you could help me figure it out. Thanks in advance.