Datajoint scientific pipeline: how to efficiently store data and design pipeline when attributes can change and data come from arrays

Question

We need to load the output from our neuroscience animal behavior training sessions into our datajoint database pipeline. We run experiments with a platform called Bpod that implements a finite state machine on an Arduino to control hardware and record events. We want to be able to analyze, for example, the response time on each trial, which is the time gap between two of the states in the finite state machine.

The data about state and event timings are saved within a matlab structure for each trial, with a field for each state that contains an array with the start and end times for that state (states can occur more than once during a trial, so the array for a state within a trial can have size(3,2) for example).

My question is what is the most efficient way to store these data within the datajoint database? Right now, we have an imported table Trials with an entry for each trial. It seems most efficient to store the start time for each state in a column with float values in some sort of part table, but I'm not sure how to do that programmatically given that (1) states can occur multiple times and (2) without hard-coding a part table for each state (the list of possible states can also vary as we adjust the behavior paradigm over time, and it would be nice to not have to rebuild the entire database to add a new state). Alternatively, we had originally imagined having a column for each state within the Trials table, but then the data would have to be blobs, which I would guess might be much less efficient? Any advice appreciated!

score 1 · Answer 1 · answered May 26 '21 at 18:01

You seem to be on the right track. Let's describe this as a concrete design (one of several possible). Then you can comment and we'll modify as necessary.

Some questions:

Are "events" synonymous with state transitions?
Are response times always between specific types of events?

First, you have trials. Let's say that they are part of a session, so the primary key will be a reference to the session and a trial id.

Let's assume that there is a lookup table State enumerating all possible states.

Within trials, you have events. Let's assume that events are synonymous transitions between two states. Then will define this as the part table Trial.Event. It can be identified by the event time within the trial. We can use the start time to distinguish events within trials. Not that we avoid using floats in primary keys due to the difficulty of equality conditions with floats.

@schema
class Trial(dj.Imported):
    definition = """
    -> Session
    trial_id : smallint unsigned
    ---
    trial_start_time : float  # (s) from session start
    """

    class Event(dj.Part):
        definition = """
        # Trial.Event marks the time of transition into a state 
        -> master
        event_time : decimal(6,3)  # (s) from trial start
        ---
        -> State
        """

    def make(self, key):
        ...

This is a start. This can ingest the data from the .mat files. You can then define a downstream computed table to calculate the response times.

Let me know if this is helpful and if you would like to further expand or modify this design.

Datajoint scientific pipeline: how to efficiently store data and design pipeline when attributes can change and data come from arrays

1 Answers1