Let's say you have table Fit
that uses an external blob stored in the store called "octo-store"
declared as
@schema
class Fit(dj.Computed):
definition = """
-> Recording
-> Model
---
fit : blob@octo-store
"""
octo-store
can be configured as an S3 bucket and folder, for example.
DataJoint will create a hidden table for tracking externally stored blobs. You can access it as schema.external['octo-store']
.
When you insert a record into Fit
, it is tracked using the hash of its contents in this external table. Fit
makes a foreign key reference into the external table, so you cannot delete from the external table any entries that actually used.
The following command
Fit.delete()
will remove the references from Fit
, but not from the external tracking table or the remote storage. This gives you high performance and data integrity at the cost of leaving the unused external data around, at least temporarily.
This means that every once in a while, you need to remove the unused entries in the external table and in the external storage. Since race conditions are not handled as precisely here as in a pure database transaction, it's best to do this in off times when the data are not actively manipulated.
The command
schema.external['octo-store'].delete(delete_external_files=True)
will remove the unused entries in the external table and the corresponding files in the external storage. This is the recommended way of clearing the data if you know that the store is only used by this database (This should be the case.)
DataJoint gives you the option of not deleting the external files
schema.external['octo-store'].delete(delete_external_files=False)
This will leave files in the remote storage that are not tracked by the database. It will become your responsibility to remove them when you choose.