I'm trying to write a Python script that will take any CSV file, run it through a geocoder, and then write the resulting geocoding attributes (+ all the data from the original file) to a new csv file.
My code so far is the following, and I should note everything is working as expected except for combining the geocoding attributes with the data in the raw csv file. Currently what's happening is that all the original csv file's field values for a particular row are presented as just one value in the csv file (although the geocoding attributes appear correctly). The problem with the script is located towards the end. I left out the code for the different classes for brevity.
I should also note I'm using hasattr* because although I don't know what all fields are in the original in_file, I do know that somewhere in the input csv these fields will be present and these are the fields needed for the geocoding.
Originally I tried changing "new_file.writerow([])" to "new_file.writerow()", at this point the row input -r- did write correctly the csv file, but the geocoding attributes could no longer be written to the csv as they were looked at as additional arguments.
def locate(file=None):
""" locate by geocoding func"""
start_time = time.time()
count = 0
if file != None:
with open (file) as in_file:
f_csv = csv.reader(in_file)
# regex headers and lowercase to standarize for hasattr func.
headers = [ re.sub('["\s+]', '_', h).lower() for h in next(f_csv)]
# Used namedtuple for headers
Row = namedtuple('Row', headers)
# for row in file
for r in f_csv:
count += 1
# set row values to named tuple values
row = Row(*r)
# Try hasattr to find fields names address, city, state, zipcode
if hasattr(row, 'address'):
address = row.address
elif hasattr(row, 'address1'):
address = row.address1
if hasattr(row, 'city'):
city = row.city
if hasattr(row, 'state'):
state = row.state
elif hasattr(row, 'st'):
state = row.st
if hasattr(row, 'zipcode'):
zipCode = row.zipcode
elif hasattr(row, 'zip'):
zipCode = row.zipcode
# Create new address object
addressObject = Address(address, city, state, zipCode)
# Get response from api
data = requests.get(addressObject.__str__()).json()
try:
data['geocodeStatusCode'] = int(data['geocodeStatusCode'])
except:
data['geocodeStatusCode'] = None
if data['geocodeStatusCode'] == 'SomeNumber':
# geocoded address ideally uses parent class attributes
geocodedAddressObject = GeocodedAddress(addressObject.address, addressObject.city, addressObject.state, addressObject.zipCode, data['addressGeo']['latitude'], data['addressGeo']['longitude'], data['addressGeo']['score'])
else:
geocodedAddressObject = GeocodedAddress(addressObject.address, addressObject.city, addressObject.state, addressObject.zipCode)
# Problem Area
geocoded_file = file.replace('.csv', '_geocoded2') + '.csv'
with open(geocoded_file, 'a', newline='') as geocoded:
# Problem area -- the r -row- attribute writes all within the same cell even though they are comma separated. The geocoding attributes do write correctly to the csv file
new_file = csv.writer(geocoded)
new_file.writerow([r, geocodedAddressObject.latitude, geocodedAddressObject.longitude, geocodedAddressObject.geocodeScore])
print('The time to geocode {} records: {}'.format(count, (time.time() - start_time)))
CSV Input Data Example:
"UID", "Occupant", "Address", "City", "State", "ZipCode"
"100001", "Playstation Theater", "New York", "NY", "10036"
"100002", "Ed Sullivan Theater", "New York, "NY", "10019"
CSV Output Example (the additional fields are parsed during geocoding)
"UID", "Occupant", "Address", "City", "State", "ZipCode", "GeoCodingLatitude", "GeoCodingLongitude", "GeoCodingScore"
"100001", "Playstation Theater", "New York", "NY", "10036", "45.1234", "-110.4567", "100"
"100002", "Ed Sullivan Theater", "New York, "NY", "10019", "44.1234", "-111.4567", "100"