What is the best practice for lookup values in SQLAlchemy?

Question

I am writing a pretty basic Flask application using Flask-SQLAlchemy for tracking inventory and distribution. I could use some guidance on how the best way to handle a lookup table for common values. My database back end will be MySQL and ElasticSearch for searches.

If I have a common mapping structure where all data going into a specific table, say Vehicle, have a common list of values to look up against for the Vehicle.make column, what would the best way to achieve this be?

My thought for approaching this is one of two ways:

Lookup Table

I could set something up similar to this where I have a relationship, and store the make in VehicleMake. However, if my expected list of makes is low (say 10), this seems unnecessary.

class VehicleMake(Model):
    id = Column(Integer, primary_key=True)
    name = Column(String(16))
    cars = relationship('Vehicle', backref='make', lazy='dynamic')

class Vehicle(Model):
    id = Column(Integer, primary_key=True)
    name = Column(String(32))

Store as a String

I could just store this as a string on the Vehicle model. But would it be a waste of space to store a common value as a string?

class Vehicle(Model):
    id = Column(Integer, primary_key=True)
    name = Column(String(32))
    make = Column(String(16))

My original idea was just to have a dict containing a mapping like this and reference it as needed within the model. I am just not clear how to tie this in when returning the vehicle model.

MAKE_LIST = {
    1: 'Ford',
    2: 'Dodge',
    3: 'Chevrolet'
}

Any feedback is welcome - and if there is documentation that covers this specific scenario I'm happy to read that and answer this question myself. My expected volume is going to be low (40-80 records per week) so it doesn't need to be ridiculously fast, I just want to follow best practices.

Ruben Helsloot · Accepted Answer · 2022-07-24T14:06:11.917

The short answer is it depends.

The long answer is that it depends on what you store along with the make of said vehicles and how often you expect to add new types.

If you need to store more than just the name of each make, but also some additional metadata, like the size of the gas tank, the cargo space, or even a sortkey, go for an additional table. The overhead of such a small table is minimal, and if you communicate with the frontend using make ids instead of make names, there is no problem at all with this. Just remember to add an index to vehicle.make_id to make the lookups efficient.

class VehicleMake(Model):
    id = Column(Integer, primary_key=True)
    name = Column(String(16))
    cars = relationship('Vehicle', back_populates="make", lazy='dynamic')

class Vehicle(Model):
    id = Column(Integer, primary_key=True)
    name = Column(String(32))
    make_id = Column(Integer, ForeignKey('vehicle_make.id'), nullable=False)
    make = relationship("VehicleType", innerjoin=True)

Vehicle.query.get(1).make.name # == 'Ford', the make for vehicle 1
Vehicle.query.filter(Vehicle.make_id == 2).all() # all Vehicles with make id 2
Vehicle.query.join(VehicleMake)\
    .filter(VehicleMake.name == 'Ford').all() # all Vehicles with make name 'Ford'

If you don't need to store any of that metadata, then the need for a separate table disappears. However, the general problem with strings is that there is a high risk of spelling errors and capital/lowercase letters screwing up your data consistency. If you don't need to add new makes much, it's a lot better to just use Enums, there are even MySQL specific ones in SQLAlchemy.

import enum

class VehicleMake(enum.Enum):
    FORD = 1
    DODGE = 2
    CHEVROLET = 3

class Vehicle(Model):
    id = Column(Integer, primary_key=True)
    name = Column(String(32))
    make = Column(Enum(VehicleMake), nullable=False)

Vehicle.query.get(1).make.name # == 'FORD', the make for vehicle 1
Vehicle.query.filter(Vehicle.make == VehicleMake(2)).all() # all Vehicles with make id 2
Vehicle.query.filter(Vehicle.make == VehicleMake.FORD).all() # all Vehicles with make name 'Ford'

The main drawback of enums is that they might be hard to extend with new values, although at least for Postgres the dialect specific version was a lot better at this than the general SQLAlchemy one, have a look at sqlalchemy.dialects.mysql.ENUM instead. If you want to extend your existing enum, you can always just execute raw SQL in your Flask-Migrate/Alembic migrations.

Finally, the benefits of using strings is that you can always programmatically enforce your data consistency. But, this comes at the cost that you have to programmatically enforce your data consistency. If the vehicle make can be changed or inserted by external users, even colleagues, this will get you in trouble unless you're very strict about what enters your database. For example, it might be nice to uppercase all values for easy grouping, since it effectively reduces how much can go wrong. You can do this during writing, or you can add an index on sqlalchemy.func.upper(Vehicle.make) and use hybrid properties to always query the uppercase value.

class Vehicle(Model):
    id = Column(Integer, primary_key=True)
    name = Column(String(32))
    _make = Column('make', String(16))

    @hybrid_property
    def make(self):
        return self._make.upper()

    @make.expression
    def make(cls):
        return func.upper(cls._make)

Vehicle.query.get(1).make.upper() # == 'FORD', the make for vehicle 1
Vehicle.query.filter(Vehicle.make == 'FORD').all() # all Vehicles with make name 'FORD'

Before you make your choice, also think about how you want to present this to your user. If they should be able to add new options themselves, use strings or the separate table. If you want to show a dropdown of possibilities, use the enum or the table. If you have an empty database, it's going to be difficult to collect all string values to display in the frontend without needing to store this as a list somewhere in your Flask environment as well.

What is the best practice for lookup values in SQLAlchemy?

1 Answers1