31

I have a DataFrame like this (first column is index (786...) and second day (25...) and Rainfall amount is empty):

Day Rainfall amount (millimetres)  
786   25                              
787   26                              
788   27                              
789   28                              
790   29                              
791    1                              
792    2                              
793    3                              
794    4                              
795    5 

and I want to delete the row 790. I tried so many things with df.drop but nothin happend.

I hope you can help me.

Nickil Maveli
  • 29,155
  • 8
  • 82
  • 85
Madddin
  • 395
  • 1
  • 3
  • 8
  • 1
    try this: `df = df[df.index != 790]` – MaxU - stand with Ukraine Jul 20 '16 at 12:33
  • 6
    post your code that you claim didn't work, firstly did you assign back the result of `drop`? ie. `df = df.drop(790)`, are you sure the first column is the index? what does `df.index` show? – EdChum Jul 20 '16 at 12:33
  • if that is not the index then `df[df['Day'] != 790]` will work as `'Day'` is a column and not the index, show the output from `df.info()` – EdChum Jul 20 '16 at 12:39
  • Yes it is the index. I know my Question is not so clear, but it worked with the code from MaxU. – Madddin Jul 20 '16 at 12:40
  • 1
    Your question is unclear because you didn't include your complete code, stating that a function didn't work does not give a complete picture – EdChum Jul 20 '16 at 12:41
  • @Madddin, if it's an index, then the EdChum's code: `df = df.drop(790)` should work as well. – MaxU - stand with Ukraine Jul 20 '16 at 12:41
  • @Madddin, please consider [accepting](http://meta.stackexchange.com/a/5235) frist's answer - this will also indicate that your question has been answered ;) – MaxU - stand with Ukraine Jul 20 '16 at 12:45
  • He was right, the code doesn't work. I ran into the same problem. It's as one of the answers says, it needs "inplace = true". In the docs it doesn't say that. It shows 4 columns a,b,c,d and they use .drop and 2 are gone. – Wayne Filkins Jul 18 '19 at 06:17

2 Answers2

81

While dropping new DataFrame returns. If you want to apply changes to the current DataFrame you have to specify inplace parameter.

Option 1
Assigning back to df -

df = df.drop(790)

Option 2
Inplace argument -

df.drop(790, inplace=True)
jpp
  • 159,742
  • 34
  • 281
  • 339
frist
  • 1,918
  • 12
  • 25
  • 10
    Curious why it was decided that an object's method returns another object, while it's more reasonably expected for it to mutate the object itself... – Renato Byrro Sep 05 '19 at 15:11
  • What you expect is not what other people expect... An argument on the other side is that it can require more reasoning. Using immutable code, the only place where a piece of data can be modified is with an explicit assignment, with mutable operations you can to look at *every* call to a method. – Att Righ Nov 07 '21 at 19:26
  • None of these answers are working and I'm perplexed why. I've merged three CSV files and they mistakenly have the headers copied. After using df.drop() I see that the length of my dataframe correctly decreases by 2 (I have two bad rows of extra headers deep inside). However, attempting to histogram a column throws errors, indicating there's still a phantom row. Can anyone explain? – Ryan Dorrill Jun 21 '22 at 18:03
1

As others may be in my shoes, I'll add a bit here. I've merged three CSV files of data and they mistakenly have the headers copied into the dataframe. Now, naturally, I assumed pandas would have an easy method to remove these obviously bad rows. However, it's not working and I'm still a bit perplexed with this. After using df.drop() I see that the length of my dataframe correctly decreases by 2 (I have two bad rows of headers). But the values are still there and attempts to make a histogram will throw errors due to empty values. Here's the code:

df1=pd.read_csv('./summedDF_combined.csv',index_col=[0])
print len(df1['x'])
badRows=pd.isnull(pd.to_numeric(df1['y'], errors='coerce')).nonzero()[0]
print "Bad rows:",badRows
df1.drop(badRows, inplace=True)
print len(df1['x'])

I've tried other functions in tandem with no luck. This shows an empty list for badrows but still will not plot due to the bad rows still being in the df, just deindexed:

print len(df1['x'])
df1=df1.dropna().reset_index(drop=True)
df1=df1.dropna(axis=0).reset_index(drop=True)
badRows=pd.isnull(pd.to_numeric(df1['x'], errors='coerce')).nonzero()[0]
print "Bad rows:",badRows

I'm stumped, but have one solution that works for the subset of folks who merged CSV files and got stuck. Go back to your original files and merge again, but take care to exclude the headers like so:

head -n 1 anyOneFile.csv > summedDFs.csv && tail -n+2 -q summedBlipDF2*.csv >> summedDFs.out

Apologies, I know this isn't the pythonic or pandas way to fix it and I hope the mods don't feel the need to remove it as it works for the small subset of people with my problem.

Ryan Dorrill
  • 456
  • 5
  • 10