1

As an exercise, I'm pulling data from an API and inserting it into a psql database. I was initially following the default limit of 1000 entries per pull, but decided I wanted to try and get all of the data which is approximately 40K rows. After a bit of experimentation, I can pull 4800, but then I get the following:

Traceback (most recent call last):
  File "data_pull.py", line 19, in <module>
    postgres_db.Bike_Count.insert_many(data).execute()
  File "/usr/local/lib/python3.5/dist-packages/peewee.py", line 3516, in execute
    cursor = self._execute()
  File "/usr/local/lib/python3.5/dist-packages/peewee.py", line 2901, in _execute
    sql, params = self.sql()
  File "/usr/local/lib/python3.5/dist-packages/peewee.py", line 3484, in sql
    return self.compiler().generate_insert(self)
  File "/usr/local/lib/python3.5/dist-packages/peewee.py", line 2084, in generate_insert
    value = row_dict[field]
KeyError: <peewee.IntegerField object at 0x7f5b32c2c7f0>

data_pull.py

import json, requests, peewee
import postgres_db


endpoint =  'https://data.seattle.gov/resource/4xy5-26gy.json?$limit=4800'

response = requests.get(endpoint, headers={'X-App-Token': '(REMOVED)'})
if response.status_code == 200:
    data = json.loads(response.text)


postgres_db.Bike_Count.create_table(True)
postgres_db.Bike_Count.insert_many(data).execute()

postgres_db.py

import peewee


psql_db = peewee.PostgresqlDatabase('database', user='my_username')

class Bike_Count(peewee.Model):
    date = peewee.DateTimeField()
    fremont_bridge_sb = peewee.IntegerField()
    fremont_bridge_nb = peewee.IntegerField()

    class Meta:
        database = psql_db

I've looked at the tables online thinking there was an issue with an entry there but I can't find anything obvious. Thanks for the help.

Kurt Maurer
  • 425
  • 1
  • 5
  • 15
  • How are you inserting the response data string directly into the table without formatting it per necessity? Is the response guaranteed to return a list with a datetime value and two integers? – lightsong May 07 '17 at 15:25
  • What do you mean by formatting it by necessity? My understanding is that json.loads() is returning an object, which, in this case, is a list of dictionaries. The API docs says the fields are a date/time field, and two Number fields. It's just weird that I can get a table with up to ~4800 rows and then it hits an error. – Kurt Maurer May 07 '17 at 17:44
  • Here's the link to info on the data set:https://data.seattle.gov/Transportation/Fremont-Bridge-Hourly-Bicycle-Counts-by-Month-Octo/65db-xm6k – Kurt Maurer May 07 '17 at 17:45

1 Answers1

1

I tried your code locally (removing the app token and the 4800 limit) and it worked as expected:

  id  |        date         | fremont_bridge_sb | fremont_bridge_nb 
------+---------------------+-------------------+-------------------
    1 | 2017-01-09 06:00:00 |                28 |                55
    2 | 2017-01-04 20:00:00 |                19 |                10
    3 | 2017-01-18 13:00:00 |                18 |                18
    4 | 2017-01-06 11:00:00 |                22 |                15
    5 | 2017-01-27 11:00:00 |                39 |                38
    6 | 2017-01-08 14:00:00 |                 6 |                10
    7 | 2017-01-06 23:00:00 |                 8 |                 3
    8 | 2017-01-27 13:00:00 |                45 |                35
...

What I noticed when I ran it with the LIMIT attached is that one of the rows returned by the API only contains a date key (missing the fremont_bridge_nb and fremont_bridge_sb fields).

Peewee requires for bulk inserts that each row have the same keys, so the issue is that peewee is expecting to find all 3 keys.

coleifer
  • 24,887
  • 6
  • 60
  • 75
  • Thanks a ton for fishing through the data like that. I noticed that as well yesterday; in fact, there are at least two rows that are missing values. I attempted to set ```null = True``` to say that it was okay if they're null, but after some troubleshooting, I finally settled on setting their defaults to 1. – Kurt Maurer May 08 '17 at 19:52