-2

I read a CSV file and use the usaddress library to parse an address field. How do I write the resulting OrderedDicts to another CSV file?

import usaddress
import csv

with open('output.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        addr=row['Case Parties Address']
        data = usaddress.tag(addr)
        print(data)
(OrderedDict([('AddressNumber', u'4167'), ('StreetNamePreType', u'Highway'), ('StreetName', u'319'), ('StreetNamePostDirectional', u'E'), ('PlaceName', u'Conway'), ('StateName', u'SC'), ('ZipCode', u'29526-5446')]), 'Street Address')
davidism
  • 121,510
  • 29
  • 395
  • 339
  • Give this a read: http://www.gadzmo.com/python/reading-and-writing-csv-files-with-python-dictreader-and-dictwriter/ – gcarvelli Jul 24 '15 at 14:41
  • Do you have a new header for each line of the `for` loop or a single header once? I have posted a solution below assuming a single header but I'm not sure what you are looking for... – isosceleswheel Jul 24 '15 at 14:43

3 Answers3

1

see this github issue for a solution:

import csvkit
import usaddress

# expected format in input.csv: first column 'id', second column 'address'
with open('input.csv', 'rU') as f:
    reader = csvkit.DictReader(f)

    all_rows = []
    for row in reader:
        try:
            parsed_addr = usaddress.tag(row['address'])
            row_dict = parsed_addr[0]
        except:
            row_dict = {'error':'True'}

        row_dict['id'] = row['id']
        all_rows.append(row_dict)

field_list = ['id','AddressNumber', 'AddressNumberPrefix', 'AddressNumberSuffix', 'BuildingName', 
              'CornerOf','IntersectionSeparator','LandmarkName','NotAddress','OccupancyType',
              'OccupancyIdentifier','PlaceName','Recipient','StateName','StreetName',
              'StreetNamePreDirectional','StreetNamePreModifier','StreetNamePreType',
              'StreetNamePostDirectional','StreetNamePostModifier','StreetNamePostType',
              'SubaddressIdentifier','SubaddressType','USPSBoxGroupID','USPSBoxGroupType',
              'USPSBoxID','USPSBoxType','ZipCode', 'error']

with open('output.csv', 'wb') as outfile:
    writer = csvkit.DictWriter(outfile, field_list)
    writer.writeheader()
    writer.writerows(all_rows)

some notes:

  • because each tagged address can have a different set of keys, you should define the columns in the output with all possible keys. this isn't a problem, because we know all the possible usaddress labels
  • the usaddress tag method will raise an error if it is unable to concatenate address tokens in an intuitive way. these errors should be captured in the output
jonathan
  • 784
  • 1
  • 10
  • 27
Cathy D.
  • 96
  • 3
  • it is better to include the actual solution in the post, as well as the link. This way the solution is provided and survives the inevitable loss of links over time. – jonathan Apr 09 '18 at 19:59
0

Without knowing about the usaddress module, it looks like data is a dict in this case, so when you print a dict it prints each key: value pair. I am guessing you want to use the keys as the header in my solution below, and the values for each line of data.

Here is a suggestion using the code fragment you posted and some edits. In this case, you get a new header and a new line of data for each iteration of the for loop, which is what it seems like you are going for without further info:

with open('output.csv') as csvfile:
reader = csv.DictReader(csvfile)

with open('myoutputfile', 'w') as o:  # this will be the new file you write to
    for row in reader:
        addr=row['Case Parties Address']
        data = usaddress.tag(addr)
        header = ','.join(data.keys()) + '\n'  # this will make a string of the header separated by comma with a newline at the end
        data_string = ','.join(data.values()) + '\n' # this will make a string of the values separated by comma with a newline at the end
        o.write(header + data_string)  # this will write the header and then the data on a new line with each field separated by commas

Hope this helps. If you are trying to write a single header and then rows of data for each iteration of the for loop, it would look a little different...

isosceleswheel
  • 1,516
  • 12
  • 20
0

The following should work. It assumes each address entry contains the same fields. The first entry is used to automatically create the headers.

import usaddress
import csv

with open('output.csv', 'r') as f_input, open('case_parties.csv', 'wb') as f_output:
    csv_input = csv.DictReader(f_input)
    csv_output = csv.writer(f_output)
    write_headers = True

    for row in csv_input:
        addr=row['Case Parties Address']
        data = usaddress.tag(addr)

        if write_headers:
            csv_output.writerow(data[0].keys())
            write_headers = False

        csv_output.writerow(data[0].values())
Martin Evans
  • 45,791
  • 17
  • 81
  • 97