0

New to python. Im working with pygeocodio library in python

API_KEY = "myapikey"

from geocodio import GeocodioClient

client = GeocodioClient(API_KEY)


addresses = client.geocode("21236 Birchwood Loop, 99567, AK")
addresses.best_match.get("accuracy")
Out[61]: 1

addresses.best_match.get("accuracy_type")
Out[62]: 'rooftop'

However, if i want to iterate through a dataframe(example.csv):

import pandas as pd
customers = pd.read_csv("example.csv")

for row in customers.iterrows():
    addresses = client.geocode(row)
    addresses.best_match.get("accuracy")

I receive an error:

  File "C:\Users\jtharian\AppData\Local\Continuum\anaconda3\lib\site-packages\geocodio\client.py", line 58, in error_response
    raise exceptions.GeocodioDataError(response.json()["error"])

GeocodioDataError: Could not geocode address. Postal code or city required.

reprex of example.csv:

21236 Birchwood Loop, 99567, AK
1731 Bragaw St, 99508, AK
300 E Fireweed Ln, 99503, AK
4360 Snider Dr, 99654, AK
1921 W Dimond Blvd 108, 99515, AK
2702 Peger Rd, 99709, AK
1651 College Rd, 99709, AK
898 Ballaine Rd, 99709, AK
23819 Immelman Circle, 99567, AK
9750 W Parks Hwy, 99652, AK
7205 Shorewood Dr, 99645, AK

Why do I receive this error?

  • Does your dataframe have only one column? – Buckeye14Guy Oct 12 '19 at 16:28
  • Yes only one column @Buckeye14Guy –  Oct 12 '19 at 16:29
  • 1
    `for index,row in df.iterrows(): client.geocode(row.values[0])` Also look into the `apply` method of dataframes. `iterrows` returns a tuple of index and content of the row. So you are not passing the expected argument to geocode – Buckeye14Guy Oct 12 '19 at 16:32
  • @Buckeye14Guy im new to python. that line ran but how do i use it to run addresses.best_match.get("accuracy") for every row? –  Oct 12 '19 at 16:36

2 Answers2

1

Looking at the api docs you want a single string representing the address from your columns of individual address components like the following:

location = client.geocode("1109 N Highland St, Arlington VA")

So to get a column like that in your df you could map each vector to a string and then use simple string concatenation to produce a single string that is then inserted into a new series in your df :

import pandas as pd

customers = pd.read_csv("example.csv", header=None)
customers['address_string'] = customers[0].map(str) + ' ' + customers[1].map(str) + customers[2].map(str)

Producing:

# >>> customers['address_string']
# 0       21236 Birchwood Loop 99567 AK
# 1             1731 Bragaw St 99508 AK
# 2          300 E Fireweed Ln 99503 AK
# 3             4360 Snider Dr 99654 AK
# 4     1921 W Dimond Blvd 108 99515 AK

Then you can iterate over the values of the Series of address strings and store the accuracy in a list that can be inserted into your df:

geocoded_acuracy = []
geocoded_acuracy_type = []

for address in customers['address_string'].values:
    geocoded_address = client.geocode(address)
    accuracy = geocoded_address.best_match.get("accuracy")
    accuracy_type = geocoded_address.best_match.get("accuracy_type")

    geocoded_acuracy.append(accuracy)
    geocoded_acuracy_type.append(accuracy_type)

customers['accuracy'] = geocoded_acuracy
customers['accuracy_type'] = geocoded_acuracy_type

results = customers[['address_string', 'accuracy', 'accuracy_type']]

The results df would then look like the following:

# >>> results
#                      address_string  accuracy        accuracy_type
# 0     21236 Birchwood Loop 99567 AK      1.00              rooftop
# 1           1731 Bragaw St 99508 AK      1.00              rooftop
# 2        300 E Fireweed Ln 99503 AK      1.00              rooftop
# 3           4360 Snider Dr 99654 AK      1.00  range_interpolation
# 4   1921 W Dimond Blvd 108 99515 AK      1.00              rooftop
# 5            2702 Peger Rd 99709 AK      1.00              rooftop
# 6          1651 College Rd 99709 AK      1.00              rooftop
# 7          898 Ballaine Rd 99709 AK      1.00              rooftop
# 8    23819 Immelman Circle 99567 AK      1.00              rooftop
# 9         9750 W Parks Hwy 99652 AK      0.33                place
# 10       7205 Shorewood Dr 99645 AK      1.00  range_interpolation

Then to write the results df to a .csv:

results.to_csv('results.csv')

Putting all of this together yields the following code:

import pandas as pd
from geocodio import GeocodioClient

API_KEY = 'insert_your_key_here'

client = GeocodioClient(API_KEY)

customers = pd.read_csv("example.csv", header=None)
customers['address_string'] = customers[0].map(str) + ' ' + customers[1].map(str) + customers[2].map(str)

geocoded_acuracy = []
geocoded_acuracy_type = []

for address in customers['address_string'].values:
    geocoded_address = client.geocode(address)
    accuracy = geocoded_address.best_match.get("accuracy")
    accuracy_type = geocoded_address.best_match.get("accuracy_type")

    geocoded_acuracy.append(accuracy)
    geocoded_acuracy_type.append(accuracy_type)

customers['accuracy'] = geocoded_acuracy
customers['accuracy_type'] = geocoded_acuracy_type

results = customers[['address_string', 'accuracy', 'accuracy_type']]

results.to_csv('results.csv')
Dodge
  • 3,219
  • 3
  • 19
  • 38
  • How do I convert the final output to a dataframe so I can write to a new csv? also, can u please explain the customers['address_string'] syntax –  Oct 12 '19 at 17:30
  • @JoelTharian When you say final output do you want the address and the accuracy? When I look at the docs you'll get a json response for each address and parsing that to `df` then to `csv` could be an entirely separate question. I added info on the `address_string` syntax – Dodge Oct 12 '19 at 17:40
  • But to answer your original question "Why do I receive this error?" I suspect that is the result of providing a pandas Series object to a function that takes a string. – Dodge Oct 12 '19 at 17:45
  • 1
    I only want accuracy and accuracy type along with their respective address to be displayed in 3 columns in my output file @Dodge –  Oct 12 '19 at 18:07
  • the result is exactly what i want but when I run customers['address_string'] I get an error. KeyError = 1 –  Oct 12 '19 at 18:51
  • @JoelTharian Did you use `header=None` when you did `read_csv` on your `example.csv` file? If not you will read the first first line as the header. I'm assuming you are having an issue with the line that creates the 'address_string' vector. I've just rerun the code to verify that it works and it does indeed work fine. – Dodge Oct 12 '19 at 18:59
  • Yes i did header = None. Used the same code u provided above but got error saying File "pandas/_libs/hashtable_class_helper.pxi", line 993, in pandas._libs.hashtable.Int64HashTable.get_item KeyError: 1 I get this error when I run line 6 i.e. the customers['address_string'] line –  Oct 12 '19 at 19:58
  • @JoelTharian What is the output of `customers.columns` ? It should look like: `Int64Index([0, 1, 2], dtype='int64')` assuming the `example.csv` contains data exactly as you've provided – Dodge Oct 12 '19 at 20:10
  • customers.columns Out[5]: Int64Index([0], dtype='int64') –  Oct 12 '19 at 20:13
  • That means your `df` has only one columns named `0`. Either your data is different than what you have provided or you have different code. Take the example data that you have posted and place that into a file named `example.csv`, This is what I have done to produce the output I have shown – Dodge Oct 12 '19 at 20:16
  • So my original customers.csv file contains exactly the same data I provided but it is in 1 column. I hope it didnt seem like street name, state and zip are three different columns. It is just one column where each row has an entire address like : 21236 Birchwood Loop, 99567, AK –  Oct 12 '19 at 20:17
  • 1
    Nevermind I troubleshooted it. I just needed customers['address_string'] = customers[0].map(str) instead of adding the 2 more [1],[2] strings. Thanks –  Oct 12 '19 at 20:20
  • just one question, although the code ran and I received my desired output. The code makes 2 calls to the api per address. Since there is a 2500/day limit it get consumed fast. Is there a way to make the call for accuracy and accuracy type within a single call because acc to the documentation these just show up within the output dictionary of the original forward geocoding function itself. My objective is to make only 1 call per address –  Oct 16 '19 at 14:09
  • @JoelTharian That is a good question. At this point I would create a [MCVE](https://stackoverflow.com/help/minimal-reproducible-example) and ask another question formally. If you would like to discuss more please come to python [chat](https://chat.stackoverflow.com/rooms/6/python) – Dodge Oct 16 '19 at 14:55
  • https://stackoverflow.com/questions/58416935/is-there-a-way-to-call-an-api-once-instead-of-twice-per-output-generated I have posted this as a new question here @Dodge –  Oct 16 '19 at 15:33
0

I would using apply and specific exceptions etc. but for now I guess while new just focus on what works and the errors. But when you familiarize yourself with pandas and python definitely look into these topics.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html https://geek-university.com/python/catch-specific-exceptions/

errors, address_list, accuracy_list, accuracy_type_list = [], [], [], []
for index, row in customers.iterrows():
    try:
        addresses = client.geocode(row.values[0])
        accuracy = addresses.best_match.get("accuracy")
        accuracy_type = addresses.best_match.get("accuracy_type")

        address_list.append(addresses)
        accuracy_list.append(accuracy)
        accuracy_type_list.append(accuracy_type)
    except Exception as e:
        address_list.append(None)
        accuracy_list.append(None)
        accuracy_type_list.append(None)
        errors.append(f"failure {e.args[0]} at index {index}")

What am I doing? iterrows provides tuples of index and rows. So I am geocoding each row item. If it works, I add the results to the address_list. Same with the accuracy. But when it fails, I add a message to the errors list to indicate where the error occurred in the dataframe; i.e. the index. But I also need a place holder in address_list so I just add None. So Now I can do

customers['addresses'] = address_list
customers['accuracy'] = accuracy_list
customers['accuracy_type'] = accuracy_type_list

And save my dataframe if needed. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html

Buckeye14Guy
  • 831
  • 6
  • 12
  • errors Out[73]: [ ] Errors is an empty dataframe. I am looking for all the accuracy of each address to be appended to the dataframe –  Oct 12 '19 at 16:41
  • if you are saying that the code ran but the errors variable is an empty list good. let me edit the answer – Buckeye14Guy Oct 12 '19 at 16:43
  • I think i did not explain it properly. that my bad. @buckeye. So basically what I am looking for is. for every row(address) there is an accuracy and a accuracy type that shows up once you hit the api. As i have given in my original post, for 21236 Birchwood Loop, 99567, AK I received an address accuracy = 1 and type = "rooftop". All I am looking to do is automate this for all rows in my dataframe. I do not want to catch errors. I just want an output file with three columns address,accuracy,accuracy type for each row of data –  Oct 12 '19 at 16:50
  • You can always ignore the errors list thing but when it comes to geocoding I highly suggest some sort of `try:...except:...` around your geocoding thing. But the answer I provided should be enough to go from there. If you need accuracy type just do the same thing for address_list and accuracy_list. If you don't catch exceptions and you ran into actual geocoding issues, it will stop the whole process. Say one row had a bad address, we don't want that to stop us from working on the rest if they are all good. just delete `errors.append(f"failure {e.args[0]} at index {index}")` – Buckeye14Guy Oct 12 '19 at 17:01