Inserting arrays into databases

Question

I am trying to write a general function that will insert a line of data into a table in a database but I am trying to write an array of unknown length. I am aiming to just be able to call this function in any programand write a line of data of any length to the table (assuming the table and the array are the same length.

I have tried adding the array like it is a singular peice of data.

import sqlite3
def add2Db(dbName, tableName, data):
    connection = sqlite3.connect(dbName)
    cur = connection.cursor()
    cur.execute("INSERT INTO "+ tableName +" VALUES (?)", (data))
    connection.commit()
    connection.close()

add2Db("items.db", "allItems", (1, "chair", 5, 4))

This just crashes and gives me an error saying it has 4 columns but only one value was supplied.

Rather than write your own, this is better solved with an existing SQL builder library or a full ORM such as [SQLAlchemy](https://docs.sqlalchemy.org/en/latest/core/tutorial.html#insert-expressions). — Schwern, Jan 25 '19 at 20:48
There's also a security concern; what you write is likely to be vulnerable to a [SQL injection attack](https://www.wikiwand.com/en/SQL_injection). — Schwern, Jan 25 '19 at 20:49

score 0 · Accepted Answer · answered Jan 25 '19 at 21:15

0

SQLite does not support arrays - you have to convert to a TEXT using ','.join() to join your array items into a single string and pass that.

Source: SQLite website https://www.sqlite.org/datatype3.html

answered Jan 25 '19 at 21:15

jhomr

477
3
16

I think they want to map the array to multiple columns. – Schwern Jan 25 '19 at 22:24

score 0 · Answer 2 · answered Jan 25 '19 at 23:18

I'm not a Python programmer, but I've been doing SQL a long time. I even wrote my own ORM. My advice is do not write your own SQL query builder. There's a myriad of subtle issues and especially security issues. I elaborate on a few of them below.

Instead, use a well-established SQL Query Builder or ORM. They've already dealt with these issues. Here's an example using SQLAlchemy.

from datetime import date
from sqlalchemy import create_engine, MetaData

# Connect to the database with debugging on.
engine = create_engine('sqlite:///test.sqlite', echo=True)
conn = engine.connect()

# Read the schemas from the database
meta = MetaData()
meta.reflect(bind=engine)

# INSERT INTO users (name, birthday, state, country) VALUES (?, ?, ?, ?)
users = meta.tables['users']
conn.execute(
    users.insert().values(name="Yarrow Hock", birthday=date(1977, 1, 23), state="NY", country="US")
)

SQLAlchemy can do the entire range of SQL operations and will work with different SQL variants. You also get type safety.

conn.execute(
    users.insert().values(name="Yarrow Hock", birthday="in the past", state="NY", country="US")
)

sqlalchemy.exc.StatementError: (exceptions.TypeError) SQLite Date type only accepts Python date objects as input. [SQL: u'INSERT INTO users (name, birthday, state, country) VALUES (?, ?, ?, ?)']

`insert into table values (...)` relies on column definition order

This relies on the order columns were defined in the schema. This leaves two problems. First is a readability problem.

add2Db(db, 'some_table', (1, 39, 99, 45, 'papa foxtrot', 0, 42, 0, 6)

What does any of that mean? A reader can't tell. They have to go digging into the schema and count columns to figure out what each value means.

Second is a maintenance problem. If, for any reason, the schema is altered and the column order is not exactly the same, this can lead to some extremely difficult to find bugs. For example...

create table users ( name text, birthday date, state text, country text );
vs
create table users ( name text, birthday date, country text, state text );

add2Db(db, 'users', ('Yarrow Hock', date(1977, 1, 23), 'NY', 'US'));

That insert will silently "work" with either column order.

You can fix this by passing in a dictionary and using the keys for column names.

add2Db(db, 'users', (name="Yarrow Hock", birthday=date(1977, 1, 23), state="NY", country="US"));

Then we'd produce a query like:

insert into users
    (name, birthday, state, country)
values (?, ?, ?, ?)

This leads to the next and much bigger problem.

SQL Injection Attack

Now this opens up a new problem. If we simply stick the table and column names into the query that leaves us open to one of the most common security holes, a SQL Injection Attack. That's where someone can craft a value which when naively used in a SQL statement causes the query to do something else. Like Little Bobby Tables.

While the ? protects against SQL Injection for values, it's still possible to inject via the column names. There's no guarantee the column names can be trusted. Maybe they came from the parameters of a web form?

Protecting table and column names is complicated and easy to get wrong.

The more SQL you write the more likely you're vulnerable to an injection attack.

You have to write code for everything else.

Ok, you've done insert. Now update? select? Don't forget about subqueries, group by, unions, joins...

If you want to write a SQL query builder, cool! If, instead, you have a job to do using SQL, writing yet another SQL query builder is not your job.

It's harder for anyone else to understand.

There's a good chance that any given Python programmer knows how SQLAlchemy works, and there's plenty of tutorials and documentation if they don't. There's no chance they know about your home-rolled SQL functions, and you have to write all the tutorials and docs.

score 0 · Answer 3 · answered Jan 25 '19 at 23:34

You shouldn't try to write your own ORMs without an argumented need. You will have a lot of problems, for example here's quick 25 reasons not to.

Instead use any popular orm that is proven. I recommend using SQLAlchemy as a go to outside of Django. Using it you can map a dict of values to insert it into a model just like insert(schema_name).values(**dict_name) (here's an example of insert/update).

score -1 · Answer 4 · answered Jan 25 '19 at 20:48

-1

Change your function to this:

def add2Db(dbName, tableName, data):
    num_qs = len(data)
    qm = ','.join(list('?' * num_qs))

    query = """
        INSERT INTO {table}
        VALUES ({qms})
             """.format(table=tableName,
                        qms=qm)
    connection = sqlite3.connect(dbName)
    cur = connection.cursor()
    cur.execute(query, data)
    connection.commit()
    connection.close()

answered Jan 25 '19 at 20:48

gold_cy

13,648
3
23
45

Is this vulnerable to a SQL injection attack? – Schwern Jan 25 '19 at 20:49
could be depending on use case, but thats applicable if he is accepting input from outside sources, he hasn't indicated that – gold_cy Jan 25 '19 at 20:49
`tableName` needs to be quoted and escaped. And verify that `num_qs > 0`. Otherwise I think it's ok. – Schwern Jan 25 '19 at 20:51
if you want to inject table name safely I would use parameter substitution and `psycopg2.extensions.AsIs` but once again, we don't know the context, so for the sake of answering the question here I think this will suffice – gold_cy Jan 25 '19 at 20:52
If you don't know the context code is to be used in, it's best to assume the data is dirty. And just because a function is being passed clean parameters today doesn't mean it will tomorrow. At minimum, lay out your security presumptions in the answer. – Schwern Jan 25 '19 at 20:55
I am not going to assume anything, the question was asked and it was answered, you told him the security concerns on the original post in a comment, there is no need for me to repeat that – gold_cy Jan 25 '19 at 20:56

Inserting arrays into databases

4 Answers4

insert into table values (...) relies on column definition order

SQL Injection Attack

You have to write code for everything else.

It's harder for anyone else to understand.

`insert into table values (...)` relies on column definition order