0

Original file has multiple columns but there are lots of blanks and I want to rearrange so that there is one nice column with info. Starting with 910 rows, 51 cols (newFile df) -> Want 910+x rows, 3 cols (final df) final df has 910 rows.

newFile sample

for i in range (0,len(newFile)):
    for j in range (0,48):
        if (pd.notnull(newFile.iloc[i,3+j])):
            final=final.append(newFile.iloc[[i],[0,1,3+j]], ignore_index=True)

I have this piece of code to go through newFile and if 3+j column is not null, to copy columns 0,1,3+j to a new row. I tried append() but it adds not only rows but a bunch of columns with NaNs again (like the original file).

Any suggestions?!

Sumin Ahn
  • 1
  • 1

1 Answers1

0

Your problem is that you are using a DataFrame and keeping column names, so adding a new columns with a value will fill the new column with NaN for the rest of the dataframe.
Plus your code is really inefficient given the double for loop. Here is my solution using melt()

#creating example df
df = pd.DataFrame(numpy.random.randint(0,100,size=(100, 51)), columns=list('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXY'))
#reconstructing df as long version, keeping columns from index 0 to index 3
df = df.melt(id_vars=df.columns[0:2])
#dropping the values that are null
df.dropna(subset=['value'],inplace=True)
#here if you want to keep the information about which column the value is coming from you stop here, otherwise you do 
df.drop(inplace=True,['variable'],axis=1)
print(df)
Mayeul sgc
  • 1,964
  • 3
  • 20
  • 35