1

I have a dataframe called 'data' with 55 columns and I want to create a new csv file with the first 52 columns. The last three column names I do not want to include are 'Class', 'part_id' and 'image_file'. I have been searching and the solution is something like this:

import pandas as pd
useful_columns = [col1,col2,...] #list the columns I need
data[useful_columns].to_csv('new.csv', index=False) #prevent creating extra column

#reference: https://stackoverflow.com/questions/46546388/how-to-skip-columns-of-csv-file

I get an error that says 'col1, col2 not defined' but I do have 52 columns that I want to export to a new csv file, it is so long to write each column name (Particle ID, Area(ABD), Aspect Ratio...etc). Is there a fast way to say "just take the first 52 columns from the existing dataframe and put them into a new csv file?

Thanks so much in advance!

Olga
  • 65
  • 2
  • 10

2 Answers2

3

There are two ways i can think of, depending on which is more important --- being able to write the few columns you want to select or completely numerical deselection of 'last 3'

If you can write the realtively few column names it will always be more reliable

 deselectlist =[ 'Class', 'part_id' , 'image_file']
 selectlist =[x for x in data.columns if x not in deselectlist]
 datatowrite = date[selectlist]

 datatowrite.to_csv('new.csv')

Alternately, if you dont want to actually write the name of the deselected columns you can try

 columnlist = [x for x in data.columns]
 datatowrite = data[columnlist[:-2]]

then you only drop the last three. I would of course recommend to check the order is maintained... when i tried it worked but the first one is more reliable i think

Vipluv
  • 884
  • 7
  • 23
0
useful_columns = ['title column1','title column2']
data.loc[:,useful_columns].to_csv('new.csv')

it should work if you can provide the title of the columns. otherwise it's:

useful_columns = [0:52]
data.iloc[:,useful_columns].to_csv('new.csv')
p.deman
  • 584
  • 2
  • 10
  • 24