1

I'm trying to insert data from a JSON string to MySQL using MySQLdb. The total number of columns is fixed. Each row of data from the JSON string does not always have values for each column.

Here is my sample code:

vacant_building = 'http://data.cityofchicago.org/resource/7nii-7srd.json?%24where=date_service_request_was_received=%272014-06-02T00:00:00%27'
obj = urllib2.urlopen(vacant_building)
data = json.load(obj)

def insert_mysql(columns, placeholders, data):
    sql = "INSERT INTO vacant_buildings (%s) VALUES (%s)" % (columns, placeholders)
    db = MySQLdb.connect(host="localhost", user="xxxx", passwd="xxxx", db="chicago_data")
    cur = db.cursor()
    cur.execute(sql, data)

for row in data:
    placeholders = ', '.join(['%s'] * len(row))
    columns = ', '.join(c[:64] for c in row.keys())
    row_data = ', '.join(str(value) for value in row.values())
    insert_mysql(columns, placeholders, row_data)

I get the following error:

    query = query % tuple([db.literal(item) for item in args])
TypeError: not all arguments converted during string formatting

I'm pretty sure the error has to do with the way I'm inserting the values. I've tried to change this to:

sql = "INSERT INTO vacant_buildings (%s) VALUES (%s) (%s)" % (columns, placeholders, data) 

but I get a 1064 error. It's because the values are not enclosed by quotes (').

Thoughts to fix?

Air
  • 8,274
  • 2
  • 53
  • 88
dfriestedt
  • 483
  • 1
  • 3
  • 18

1 Answers1

1

In order to parameterize your query using MySQLdb's cursor.execute method, the second argument to execute has to be a sequence of values; in your for loop, you're joining the values together into one string with the following line:

row_data = ', '.join(str(value) for value in row.values())

Since you generated a number of placeholders for your values equal to len(row), you need to supply that many values to cursor.execute. If you gave it only a single string, it will put that entire string into the first placeholder, leaving the others without any arguments. This will throw a TypeError - the message in this case would read, "not enough arguments for format string," but I'm going to assume you simply mixed up when copy/pasting because the opposite case (supplying too many arguments/too few placeholders) reads as you indicate, "not all arguments converted during string formatting."


In order to run an INSERT statement through MySQLdb with a variable set of columns, you could do just as you've done for the columns and placeholders, but I prefer to use mapping types with the extended formatting syntax supported by MySQLdb (e.g., %(name)s instead of %s) to make sure that I've constructed my query correctly and not put the values into any wrong order. I also like using advanced string formatting where possible in my own code.

You could prepare your inputs like this:

max_key_length = 64
columns = ','.join(k[:max_key_length] for k in row.keys())
placeholders = ','.join('%({})s'.format(k[:max_key_length]) for k in row.keys())
row_data = [str(v) for v in row.values()]

Noting that the order of the dict comprehensions is guaranteed, so long as you don't alter the dict in the meanwhile.

Generally speaking, this should work okay with the sort of code in your insert_mysql function. However, looking at the JSON data you're actually pulling from that URL, you should be aware that you may run into nesting issues; for example:

>>> pprint.pprint(data[0])
{u'address_street_direction': u'W',
 u'address_street_name': u'61ST',
 u'address_street_number': u'424',
 u'address_street_suffix': u'ST',
 u'any_people_using_property_homeless_childen_gangs_': True,
 u'community_area': u'68',
 u'date_service_request_was_received': u'2014-06-02T00:00:00',
 u'if_the_building_is_open_where_is_the_entry_point_': u'FRONT',
 u'is_building_open_or_boarded_': u'Open',
 u'is_the_building_currently_vacant_or_occupied_': u'Vacant',
 u'is_the_building_vacant_due_to_fire_': False,
 u'latitude': u'41.78353874626324',
 u'location': {u'latitude': u'41.78353874626324',
               u'longitude': u'-87.63573355602661',
               u'needs_recoding': False},
 u'location_of_building_on_the_lot_if_garage_change_type_code_to_bgd_': u'Front',
 u'longitude': u'-87.63573355602661',
 u'police_district': u'7',
 u'service_request_number': u'14-00827306',
 u'service_request_type': u'Vacant/Abandoned Building',
 u'ward': u'20',
 u'x_coordinate': u'1174508.30988836',
 u'y_coordinate': u'1864483.93566661',
 u'zip_code': u'60621'}

The string representation of the u'location' column is:

"{u'latitude': u'41.78353874626324', u'needs_recoding': False, u'longitude': u'-87.63573355602661'}"

You may not want to put that into a database field, especially considering that there are atomic lat/lon fields already in the JSON object.

Community
  • 1
  • 1
Air
  • 8,274
  • 2
  • 53
  • 88