1

I know that when using python to write dicts to csv files, the headers will be put in alphabetic order. So is there a way that I can write the header with the order I want?

The code and outputs of tsv file is below.

I have a dict:

my_data = {"name": name[:], "city": city[:], "state": state[:],
           "stars": stars[:], "review_count": review_count[:],
           "main_category": new_cat[:]}

And I used following code to write them in csv file:

with open('test.tsv','w') as file:
    writer = csv.writer(file, delimiter='\t')
    writer.writerow(my_data.keys())
    for row in zip(*my_data.values()):
        writer.writerow(list(row))

And the first several rows of the output of the tsv file is below:

city    review_count    name    main_category   state   stars
Los Angeles 2   Southern California Medical Group   Medical Centers CA  3.5
Cambridge   4   Harvard Square Shiatsu  Massage MA  4.0
Kitchener   2   Faith & Glory Collective    Tattoo  ON  4.0

You can see that they are put in an alphabetic order, but what I really want is that they can be put in the order of keys in my_data like this:

name    city    state    stars    review_count    main_category
martineau
  • 119,623
  • 25
  • 170
  • 301
Parker
  • 193
  • 2
  • 3
  • 14
  • Does one or more of the answers to this question do what you want?: http://stackoverflow.com/questions/1885324/is-it-possible-to-keep-the-column-order-using-the-python-csv-dictreader – Bill Bell Nov 11 '16 at 18:47
  • But Dict does not have 'fieldnames' attributes.. Is there another way? – Parker Nov 11 '16 at 19:15
  • The keys in `my_data` don't have a defined order because it's a dictionary, not a sequence. – martineau Nov 11 '16 at 19:39
  • @martineau I recognized that. So if I want to have an output file in that order, what should I do? – Parker Nov 11 '16 at 19:44
  • @Parker use a `csv.DictWriter` and specify the `fieldnames` in the order you want... – Jon Clements Nov 11 '16 at 19:57
  • @JonClements Thanks, but can you show me how to do this because when I do this, I get an attribute error – Parker Nov 11 '16 at 20:09

3 Answers3

1

Just for one record.

import csv

output=open('temp.csv', 'w')
outputCSV = csv.DictWriter(output, delimiter = '\t', \
    fieldnames = [ 'name', 'city', 'state', 'stars', 'review_count', 'main_category' ] )

outputCSV.writerow( {
    'name': 'Southern Cal Med Group',
    'city': 'Los Angeles',
    'state': 'CA',
    'review_count': '2',
    'main_category': 'medical',
    'stars': '3.5',
    } )

output.close()
Bill Bell
  • 21,021
  • 5
  • 43
  • 58
  • I have a very large dict. And when I do this: with open('test1.tsv','w') as file: writer = csv.DictWriter(file, delimiter='\t', fieldnames = ["name","city","state","stars","review_count","main_category"]) for row in zip(*my_data): writer.writerow(list(row)) I got a valueerror "dict contains fields not in fieldnames" – Parker Nov 11 '16 at 20:31
  • So the thing with using fieldnames is that when I continue to write writerow, I will get valueerror since starting from the 2nd row of this large dict, those values are not fieldnames – Parker Nov 11 '16 at 20:34
  • **That** sounds to me like a new question. I'm a little surprised that writerow hasn't complained at received a list because I believe it's expecting a dict. I would say, just pare down the dictionary that you input to writerow. – Bill Bell Nov 11 '16 at 20:48
  • 2
    Note per csv docs `open` should use `'wb'` if Python 2 and `'w', newline=''` for Python 3. It matters for Windows. – Mark Tolonen Nov 13 '16 at 03:16
0

The following shows a couple of ways to output the data to the tsv file in the order you want:

with open('test.tsv', 'wb') as file:
    FIELDNAMES = 'name city state stars review_count main_category'.split()
    writer = csv.writer(file, delimiter='\t')
    writer.writerow(FIELDNAMES)
    writer.writerows(row for row in zip(*(my_data[key] for key in FIELDNAMES)))

As others have suggested, this could also be done by using a csv.DictWriter:

with open('test.tsv', 'wb') as file:
    FIELDNAMES = 'name city state stars review_count main_category'.split()
    writer = csv.DictWriter(file, delimiter='\t', fieldnames=FIELDNAMES)
    writer.writeheader()
    writer.writerows(dict(zip(FIELDNAMES, row))
                         for row in zip(*(my_data[key] for key in FIELDNAMES)))

As you can see there both about the same amount of code, although the first, non-DictWriter version is a little more efficient since producing the data for each row requires less effort, so accordingly it's probably a little faster.

Regardless of which type of writer is used, the contents of the test.tsv file created will be exactly the same, as shown below (where represents the tab separators):

name→city→state→stars→review_count→main_category
Southern California Medical Group→Los Angeles→CA→3.5→2→Medical Centers
Harvard Square Shiatsu→Cambridge→MA→4.0→4→Massage
Faith & Glory Collective→Kitchener→ON→4.0→2→Tattoo

Note: If you're using Python 2, you open the output file for writing with mode 'wb'. For Python 2, you should use mode 'w' and also add a newline='' keyword argument.

martineau
  • 119,623
  • 25
  • 170
  • 301
  • Thanks so much! Works perfectly except that the first row of the data is missing. I think I should start from range(0,..) – Parker Nov 12 '16 at 01:56
  • You're welcome and are correct about the call to `range()` needing to be changed. It got the way it was because I incorrectly reconstructed the contents of `my_data` (because what's shown in your question wasn't sufficient for testing purposes). I suggest you read [_How to create a Minimal, Complete, and Verifiable example_](https://stackoverflow.com/help/mcve) before posting any further questions. – martineau Nov 12 '16 at 16:24
  • Updated answer so it no longer needs to use `range()`. Also showed how to do it with `csv.DictWriter` as well as `csv.writer` objects. – martineau Nov 13 '16 at 22:16
0

If you have a dictionaries for the fields of each row, then use a DictWriter. It has options to fill in defaults for missing fields, or to ignore extra fields that you don't want in the output.

Example:

import csv

# Sample data converted to a list of lines.
datalines = '''\
Los Angeles,2,Southern California Medical Group,Medical Centers,CA,3.5
Cambridge,4,Harvard Square Shiatsu,Massage,MA,4.0
Kitchener,2,Faith & Glory Collective,Tattoo,ON,4.0
'''.splitlines()

# Specify the fields you want in the output.
# If your dictionaries have fields you don't want in the output, use extrasaction='ignore'
fields = 'name city state stars review_count main_category'.split()

# Python 3, use newline=''.  Python 2 use 'wb' instead.
with open('test.tsv','w',newline='') as file:
    writer = csv.DictWriter(file, fields, delimiter='\t', extrasaction='ignore')

    writer.writeheader() # Writes headers specified by "fields"

    for line in datalines:
        # Converts input line into a dict in OP's format.
        my_data = dict(zip('city review_count name main_category state stars'.split(),line.split(',')))
        print(my_data)
        writer.writerow(my_data)

Output to verify dictionaries are in OP's format:

{'stars': '3.5', 'review_count': '2', 'name': 'Southern California Medical Group', 'city': 'Los Angeles', 'state': 'CA', 'main_category': 'Medical Centers'}
{'stars': '4.0', 'review_count': '4', 'name': 'Harvard Square Shiatsu', 'city': 'Cambridge', 'state': 'MA', 'main_category': 'Massage'}
{'stars': '4.0', 'review_count': '2', 'name': 'Faith & Glory Collective', 'city': 'Kitchener', 'state': 'ON', 'main_category': 'Tattoo'}

Output to test.tsv (→ for tab):

name→city→state→stars→review_count→main_category
Southern California Medical Group→Los Angeles→CA→3.5→2→Medical Centers
Harvard Square Shiatsu→Cambridge→MA→4.0→4→Massage
Faith & Glory Collective→Kitchener→ON→4.0→2→Tattoo
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251