Python-Deleting duplicate rows Pandas (Specifically)

Asked Mar 19 '21 at 08:40

Active Mar 19 '21 at 09:04

Viewed 51 times

Here, is the data set which I'm working on Which looks like this.

Basically, I want to delete duplicate rows specifically I know the drop_duplicate command but I need some help.

Let me show you by sorting the data so that It'll give you a clear understanding.

by_streamed=data.sort_values(by='Streams',ascending=False)
by_streamed

So when I get the top 10 streamed songs the duplicates obviously interfere. If you look closely though the ranks of these songs are different

I want to remove these type of duplicate rows. Here's my code,

data=data.drop_duplicates(subset=['Artist','Title'],keep='first')

But this removes a lot of rows that weren't supposed to be.

There is indeed an issue with subset but I can't interpret it. It would be great if you could help me figure it out. Thanks in advance.

edited Mar 19 '21 at 09:04

asked Mar 19 '21 at 08:40

kirti purohit

1

`But this removes a lot of columns that weren't supposed to be.` Can you explain more in some small data sample with 5 rows? – jezrael Mar 19 '21 at 08:42
do you mean `rows` and not `columns` ? – Umar.H Mar 19 '21 at 08:49
So, you want to remove the duplicates based on Artist, and Title, but not on any other columns? "But this removes a lot of columns that weren't supposed to be." This sentence in the question is misleading, cause you are dropping rows not columns. – ThePyGuy Mar 19 '21 at 08:57
I meant rows.. I have changed in the question – kirti purohit Mar 19 '21 at 09:05
Would you be able to help me please? @ThePyGuy – kirti purohit Mar 19 '21 at 09:47
Could you give some example rows that weren't supposed to be removed? – Ynjxsjmh Mar 19 '21 at 09:51
It's in the question itself. With the first code, the original songs are sorted correctly BUT WITH DUPLICATES. With this code `data=data.drop_duplicates(subset=['Artist','Title'],keep='first')` listings start directly from BTS and the upper ones got removed . If you notice streams of songs in the 2nd image are more than stream number (For eg. BTS) in the 3rd image – kirti purohit Mar 19 '21 at 09:53

0 Answers0