I have a python project that is using sqlite3 to track image files. Each image can have an arbitrary number of keywords associated with it. Keywords may be re-used among multiple images. So I figure I need two tables: images and keywords (where keywords holds each keyword only once regardless of how many images use that keyword). As far as I understand it (which is only just a little) this is a many to many relationship and requires a third table to map this relationship. The trouble I am having is when it comes to populating this third relationship table. I do not know how to get the id data that needs to go into this table when populating the tables from python.
Here are my tables
images:
CREATE TABLE images (image_id INTEGER PRIMARY KEY, location TEXT NOT NULL);
keywords:
CREATE TABLE keywords (keyword_ID INTEGER PRIMARY KEY, keyword TEXT NOT NULL, UNIQUE(keyword));
images_keywords:
CREATE TABLE images_keywords (image_id INTEGER, keyword_id INTEGER, PRIMARY KEY(image_id, keyword_id));
I have a function that tries to populate a singe image to these tables. This is a heavily simplified bit of code but it demonstrates where I am currently:
# Add a new record for the current image with its path
cursor.execute("INSERT INTO images VALUES (NULL, ?)", (image_path,))
# Add the keywords for this image. Note: these keywords may already exist in the
# keywords table so I use the "OR IGNORE" feature to not add them a second time
# (which would fail anyway because of the UNIQUE constraint added to this field)
for keyword in keywords: # <- keywords variable is a list of strings
cursor.execute("INSERT OR IGNORE INTO keywords VALUES (NULL, ?);", (keyword,))
# HERE IS WHERE I GET STUCK: HOW DO I KNOW "image_id" OR "keyword_id" IN ORDER TO
# POPULATE THIS RELATIONSHIP TABLE?
cursor.execute("INSERT INTO images_keywords VALUES (?, ?);", (image_id, keyword_id,));
Like the comment for the final line indicates, I do not know how to access the id's of the recently added rows in order to add these relationships to the relationship table. In fact, because the keywords might already exist, the second insert statement might not even do anything at all.
Do I need to run a query after each keyword insert statement to find the id of that keyword? I.e. make the code something like this?
for keyword in keywords:
cursor.execute("INSERT OR IGNORE INTO keywords VALUES (NULL, ?);", (keyword,))
row = cursor.execute("SELECT keyword_id FROM keywords WHERE keyword = ?;", (keyword,)),fetchone()
cursor.execute("INSERT INTO images_keywords VALUES (?, ?);", (image_id, row[0],));
But I still do not know the new image_id in this case.
I have a sneaking suspicion that the answer lies somewhere in this post:
Populate many-to-many person-color table using three tables in MySQL
but I am afraid that I am struggling to understand it. As far as I can tell, they are shifting things from one table to another while I think it is the interface between python and sqlite that is confusing me.
Thanks!