0

I'm trying to write a Python script that will take any CSV file, run it through a geocoder, and then write the resulting geocoding attributes (+ all the data from the original file) to a new csv file.

My code so far is the following, and I should note everything is working as expected except for combining the geocoding attributes with the data in the raw csv file. Currently what's happening is that all the original csv file's field values for a particular row are presented as just one value in the csv file (although the geocoding attributes appear correctly). The problem with the script is located towards the end. I left out the code for the different classes for brevity.

I should also note I'm using hasattr* because although I don't know what all fields are in the original in_file, I do know that somewhere in the input csv these fields will be present and these are the fields needed for the geocoding.

Originally I tried changing "new_file.writerow([])" to "new_file.writerow()", at this point the row input -r- did write correctly the csv file, but the geocoding attributes could no longer be written to the csv as they were looked at as additional arguments.

def locate(file=None):
""" locate by geocoding func"""
start_time = time.time()
count = 0

if file != None:

    with open (file) as in_file:
        f_csv = csv.reader(in_file)

        # regex headers and lowercase to standarize for hasattr func.
        headers = [ re.sub('["\s+]', '_', h).lower() for h in next(f_csv)]

        # Used namedtuple for headers
        Row = namedtuple('Row', headers)

        # for row in file
        for r in f_csv:
            count += 1
            # set row values to named tuple values
            row = Row(*r)

            # Try hasattr to find fields names address, city, state, zipcode
            if hasattr(row, 'address'):
                address = row.address
            elif hasattr(row, 'address1'):
                address = row.address1
            if hasattr(row, 'city'):
                city = row.city
            if hasattr(row, 'state'):
                state = row.state
            elif hasattr(row, 'st'):
                state = row.st
            if hasattr(row, 'zipcode'):
                zipCode = row.zipcode
            elif hasattr(row, 'zip'):
                zipCode = row.zipcode

            # Create new address object
            addressObject = Address(address, city, state, zipCode)

            # Get response from api
            data = requests.get(addressObject.__str__()).json()

            try:
                data['geocodeStatusCode'] = int(data['geocodeStatusCode'])
            except:
                data['geocodeStatusCode'] =  None

            if data['geocodeStatusCode'] == 'SomeNumber':

                # geocoded address ideally uses parent class attributes
                geocodedAddressObject =  GeocodedAddress(addressObject.address, addressObject.city, addressObject.state, addressObject.zipCode, data['addressGeo']['latitude'], data['addressGeo']['longitude'], data['addressGeo']['score'])              


            else:

                geocodedAddressObject =  GeocodedAddress(addressObject.address, addressObject.city, addressObject.state, addressObject.zipCode)

            # Problem Area
            geocoded_file = file.replace('.csv', '_geocoded2') + '.csv'
            with open(geocoded_file, 'a', newline='') as geocoded:

                # Problem area -- the r -row- attribute writes all within the same cell even though they are comma separated. The geocoding attributes do write correctly to the csv file 
                new_file = csv.writer(geocoded)
                new_file.writerow([r, geocodedAddressObject.latitude, geocodedAddressObject.longitude, geocodedAddressObject.geocodeScore])

print('The time to geocode {} records: {}'.format(count, (time.time() - start_time)))

CSV Input Data Example:

"UID", "Occupant", "Address", "City", "State", "ZipCode"
"100001", "Playstation Theater", "New York", "NY", "10036"
"100002", "Ed Sullivan Theater", "New York, "NY", "10019"

CSV Output Example (the additional fields are parsed during geocoding)

"UID", "Occupant", "Address", "City", "State", "ZipCode", "GeoCodingLatitude", "GeoCodingLongitude", "GeoCodingScore"
"100001", "Playstation Theater", "New York", "NY", "10036", "45.1234", "-110.4567", "100"
"100002", "Ed Sullivan Theater", "New York, "NY", "10019", "44.1234", "-111.4567", "100"
Matt BS
  • 35
  • 8
  • Sounds like you should use `DictReader`. Showing expected vs. actual output and sample input would help. – Mark Tolonen May 24 '18 at 18:07
  • @MarkTolonen if you think there is anything else I should add to the question, style or content wise, to make it easier to answer - please let me know. Thank you! – Matt BS May 25 '18 at 14:27

1 Answers1

0

I figured out a solution, although it is likely not the most elegant. I transformed the namedtuple into a dictionary, using namedtuple._asdict() and then looped through the values of the row adding them to a new list. At this point I add in the geocoded variables, and then write the entire list to the row. Here is a sample of the code I changed! If you can think of a better solution, please let me know.

                    if data['geocodeStatusCode'] == 'SomeNumber':

                        # geocoded address ideally should use parent class address values and not have to be restated 
                        geocodedAddressObject =  GeocodedAddress(addressObject.address, addressObject.city, addressObject.state, addressObject.zipCode,
                                                                data['addressGeo']['latitude'], data['addressGeo']['longitude'], data['addressGeo']['score'])              


                    else:

                        geocodedAddressObject =  GeocodedAddress(addressObject.address, addressObject.city, addressObject.state, addressObject.zipCode)              


                    # This is where I made the change - set new list
                    list_values = []    

                    # Use _asdict for the named tuple
                    row_content = row._asdict()

                    # Loop through and strip white space
                    for key, value in row_content.items():
                        # print(key, value.strip())
                        list_values.append(value.strip())

                    # Extend list rather then append due to multiple values
                    list_values.extend((geocodedAddressObject.latitude, geocodedAddressObject.longitude, geocodedAddressObject.geocodeScore))

                    # Finally write the new list to the csv file - which includes both the row and the geocoded objects 
                    #- and is agnostic as to what data it's passed as long as its utf-8 complaint
                    new_file.writerow(list_values)
Matt BS
  • 35
  • 8