24

I have a csv file with 50 columns of data. I am using Pandas read_csv function to pull in a subset of these columns, using the usecols parameter to choose the ones I want:

cols_to_use = [0,1,5,16,8]
df_ret = pd.read_csv(filepath, index_col=False, usecols=cols_to_use)

The trouble is df_ret contains the correct columns, but not in the order I specified. They are in ascending order, so [0,1,5,8,16]. (By the way the column numbers can change from run to run, this is just an example.) This is a problem because the rest of the code has arrays which are in the "correct" order and I would rather not have to reorder all of them.

Is there any clever pandas way of pulling in the columns in the order specified? Any help would be much appreciated!

AButkov
  • 425
  • 1
  • 5
  • 12

2 Answers2

22

you can reuse the same cols_to_use list for selecting columns in desired order:

df_ret = pd.read_csv(filepath, index_col=False, usecols=cols_to_use)[cols_to_use]
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • Thanks for this! The method makes sense but I don't think I can use cols_to_use to reorder it, because the dataframe only has 5 columns and so columns 5, 8 and 16 are out of bounds. – AButkov Oct 13 '16 at 16:42
  • So I made col_reorder = [0,1,2,4,3] which I use at the end, i.e. df_ret = pd.read_csv(filepath, index_col=False, usecols=cols_to_use)[col_reorder]. This puts them in the desired order. – AButkov Oct 13 '16 at 16:44
  • 1
    @AButkov, my answer would work properly if you would specify column names instead of their indexes in the `cols_to_use` list – MaxU - stand with Ukraine Oct 13 '16 at 19:39
2

Just piggybacking off this question here (hi from 2018).

I discovered the same problem with my pandas read_csv and wanted to figure out a way to take the [col_reorder] using column header strings. It's as simple as defining an array of strings to use.

pd.read_csv(filepath, index_col=False, usecols=cols_to_use)[index_strings]
PeptideWitch
  • 2,239
  • 14
  • 30
  • What are your trying to do? It's not very clear... Are you after sorting column names in some specific order? Can you provide a small reproducible example (data set with 2-3 rows)? – MaxU - stand with Ukraine Jun 19 '18 at 12:02
  • Hey, just to clarify - I don't have a question, just a modified answer. I found the same problem as OP and submitted a modified version of your answer without having to specify the integer value of the header, in case our pandas dataframes have string header values. – PeptideWitch Jun 19 '18 at 12:11
  • 2
    why can't you simply do `pd.read_csv(filepath, index_col=False, usecols=cols_to_use)[cols_to_use]` where `cols_to_use` is a list of labels (column names). For example: `cols_to_use = ['b','c','a']`? If you just need to sort column names in lexicographical order we can do: `pd.read_csv(filepath, index_col=False, usecols=cols_to_use).sort_index(axis=1)` – MaxU - stand with Ukraine Jun 19 '18 at 12:16
  • Great question...yep, I should have spotted that. I'll edit my answer. – PeptideWitch Jun 19 '18 at 12:25