0

I use non-dependent tables to restrict using a custom key source. As an example, my pipeline stages are all parameterized and use a mixin key_source to specify which parameters are used for each stage for a given dataset:

class ParamsMixin:
    @property
    def key_source(self):
        return super().key_source & (models.Processing * models.Specification)

I have other examples that use a custom query to do a restriction:

    @property
    def key_source(self):
        # Only normalize combinatorial rounds.
        return (super().key_source * models.AcquisitionRound) & {
            "acquisition_round_kind": "combinatorial"
        }

This works great for data processing, but the dependencies are not explicit in the table definition. Therefore, I lose the ability to use cascades to propagate deletions from the non-dependent tables in the custom key_source -- which violates data integrity in some respects. Also, when using dj.create_virtual_module, underlying functions like _jobs_to_do will not be correct. Is there an alternate design that could allow me to keep these functionality?

HoosierDaddy
  • 720
  • 6
  • 19

1 Answers1

0

That is correct.

The foreign keys dictate how deletes are cascaded. They also, by default define the key_source, i.e. the query that generates the values for the primary key values for automated computations. The default key_source query is the join of the tables referenced by foreign keys made from within the primary key of the computed table. If you override the key_source to restrict the calculation to a specific subset, that will not affect the foreign key constraints. So if you wish to alter how deletes are propagated perhaps the foreign key constraints need to be revised.

I know this is not necessarily helping your question directly, just highlighting the distinction of roles of foreign keys and key_source.