41

For example, my csv has columns as below:

ID, ID2, Date, Job No, Code

I need to write the columns back in the same order. The dict jumbles the order immediately, so I believe it's more of a problem with the reader.

martineau
  • 119,623
  • 25
  • 170
  • 301
Alex
  • 1,360
  • 2
  • 12
  • 18

6 Answers6

65

Python's dicts do NOT maintain order prior to 3.6 (but, regardless, in that version the csv.DictReader class was modified to return OrderedDicts).

However, the instance of csv.DictReader that you're using (after you've read the first row!-) does have a .fieldnames list of strings, which IS in order.

So,

for rowdict in myReader:
  print ['%s:%s' % (f, rowdict[f]) for f in myReader.fieldnames]

will show you that the order is indeed maintained (in .fieldnames of course, NEVER in the dict -- that's intrinsically impossible in Python!-).

So, suppose you want to read a.csv and write b.csv with the same column order. Using plain reader and writer is too easy, so you want to use the Dict varieties instead;-). Well, one way is...:

import csv

a = open('a.csv', 'r')
b = open('b.csv', 'w')
ra = csv.DictReader(a)
wb = csv.DictWriter(b, None)

for d in ra:

  if wb.fieldnames is None:
    # initialize and write b's headers
    dh = dict((h, h) for h in ra.fieldnames)
    wb.fieldnames = ra.fieldnames
    wb.writerow(dh)

  wb.writerow(d)

b.close()
a.close()

assuming you have headers in a.csv (otherewise you can't use a DictReader on it) and want just the same headers in b.csv.

martineau
  • 119,623
  • 25
  • 170
  • 301
Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • Thanks Alex Martelli! You've gone over and above the call of duty here :) and... it is appreciated! I didn't realise there was fieldnames, but re-reading the API docs I can see it now. Thanks for the alternative, but since my DictReader is working well now I'll stick with it. – Alex Dec 11 '09 at 04:03
  • @RaffiKhatchadourian a is a.csv, b is b.csv ra presumably stands for read from file a, wb presumably for write to file b, d for dictionary, dh for dictionary headers.... Alex Thanks for the awesome explanation! – Deep Dec 31 '14 at 05:22
  • 2
    any reason why you went through the whole `dh = dict(...` `...wb.writerow(dh)` when you could have done `wb.fieldnames = ra.fieldnames; wb.writeheader()`? – Baldrickk Dec 23 '16 at 11:44
  • Looks good, but how you can add quotes to all fileds in all lines except the header? – EliranA Jun 15 '17 at 07:56
10

Make an OrderedDict from each row dict sorted by DictReader.fieldnames.

import csv
from collections import OrderedDict

reader = csv.DictReader(open("file.csv"))
for row in reader:
    sorted_row = OrderedDict(sorted(row.items(),
          key=lambda item: reader.fieldnames.index(item[0])))
Community
  • 1
  • 1
xvan
  • 4,554
  • 1
  • 22
  • 37
  • would you be able to explain how the lambda function here is working? I see similar examples listed in the official [docs](https://docs.python.org/3/howto/sorting.html), however its not clear to me how this is producing the correct order (e.g. what the actual output of the lambda function looks like) – user5359531 Apr 25 '18 at 17:47
  • 1
    The last line serializes `dict row` into a (key,value) tuple array. Then resorts the array by the correct key order and repackages it on an OrderedDict. The lambda takes a `tuple item`, retrives it's key `item[0]` and looks for its index on `reader.fieldnames`. That index is used by `sorted()` to sort the tuples. – xvan Apr 25 '18 at 18:36
  • This is my favourite solution, also arrived at something like this after reading the [OrderedDict](https://docs.python.org/3.5/library/collections.html#collections.OrderedDict) docs. It has one obvious inefficiency which is that a sort is done for each row again even though the order will be the same for each row. I think however that is as good as it gets for Python 3.5, the price some of us who are forced to live in the past must pay. Probably only notice it if your csv was millions of rows long. – cardamom Aug 06 '19 at 11:48
9
from csv import DictReader, DictWriter

with open("input.csv", 'r') as input_file:
    reader = DictReader(f=input_file)
    with open("output.csv", 'w') as output_file:
        writer = DictWriter(f=output_file, fieldnames=reader.fieldnames)
        for row in reader:
            writer.writerow(row)
Raffi Khatchadourian
  • 3,042
  • 3
  • 31
  • 37
2

I know this question is old...but if you use DictReader, you can pass it an ordered list with the fieldnames to the fieldnames param

Pablo K
  • 125
  • 1
  • 10
2

Edit: as of python 3.6 dicts are ordered by insertion order, essentially making all dicts in python OrderedDicts by default. That being said the docs say dont rely on this behaviour because it may change. I will challenge that, lets see if it ever changes back :)


Unfortunatley the default DictReader does not allow for overriding the dict class, a custom DictReader would do the trick though

import csv

class DictReader(csv.DictReader):
    def __init__(self, *args, **kwargs):
        self.dict_class = kwargs.pop(dict_class, dict)
        super(DictReader, self).__init__(*args, **kwargs)

    def __next__(self):
        ''' copied from python source '''
        if self.line_num == 0:
            # Used only for its side effect.
            self.fieldnames
        row = next(self.reader)
        self.line_num = self.reader.line_num

        # unlike the basic reader, we prefer not to return blanks,
        # because we will typically wind up with a dict full of None
        # values
        while row == []:
            row = next(self.reader)
        # using the customized dict_class
        d = self.dict_class(zip(self.fieldnames, row))
        lf = len(self.fieldnames)
        lr = len(row)
        if lf < lr:
            d[self.restkey] = row[lf:]
        elif lf > lr:
            for key in self.fieldnames[lr:]:
                d[key] = self.restval
        return d

use it like so

import collections

csv_reader = DictReader(f, dict_class=collections.OrderedDict)
# ...
Pykler
  • 14,565
  • 9
  • 41
  • 50
  • 2
    Note that from Python 3.6 onwards, the rows returned from [`csv.DictReader`](https://docs.python.org/3/library/csv.html#csv.DictReader)s are now `OrderedDict`s, not regular ones as they were previously (so doing something like this is no longer necessary). The good news, I guess, is that using this approach won't hurt, it'll only be redundant. – martineau Apr 15 '18 at 16:40
  • @martineau in Python 3.8, the rows returned from csv.DictReaders are changed to dict. – lucemia Oct 17 '21 at 16:07
  • 1
    @lucemia: I know, that because `dict`s now maintain order, but still what I said before applies (i.e. that while using the approach in this answer will produce the correct results, it's superfluous). – martineau Oct 17 '21 at 16:24
0

I wrote a little tool to sort the order of CSV columns: I don't claim that it's great I know little of Python, but it does the job:

import csv
import sys

with open(sys.argv[1], 'r') as infile:
    csvReader = csv.DictReader(infile)
    sorted_fieldnames = sorted(csvReader.fieldnames)
    writer = csv.DictWriter(sys.stdout, fieldnames=sorted_fieldnames)
    # reorder the header first
    writer.writeheader()
    for row in csvReader:
        # writes the reordered rows to the new file
        writer.writerow(row)
renoX
  • 11
  • 1