I'm trying to figure out the data model and access patterns for an app keeping track of animal movements between different fields (pastures). There are movement records that look like this:
PK FROM TO DATE
------------------------------------------------------
ANIMAL#001 FIELD#A FIELD#B January 3
ANIMAL#001 FIELD#Q FIELD#R September 19
ANIMAL#002 FIELD#A FIELD#B January 3
ANIMAL#003 FIELD#C FIELD#D March 15
ANIMAL#005 FIELD#F FIELD#A April 22
For a specific field, e.g. FIELD#A
, I'd like to know all the movements into and out of that field, the date of the movement, and the number of animals. The results should look like:
DATE FROM TO NUMBER_ANIMALS
--------------------------------------------------
January 3 FIELD#A FIELD#B 2
April 22 FIELD#F FIELD#A 1
Possible solutions and attempts:
A GSI with
PK=FROM, SK=TO
. If I query the GSI withPK=FIELD#A
, this only gives one half of the picture, that is, movements fromFIELD#A
. I can't obtain movements toFIELD#A
.A composite attribute like
FIELD#A#FIELD#B
used as thePK
in a GSI. Runs into the same problem as attempt 1.Two GSI. GSI1 has
PK=FROM
and GSI2 hasPK=TO
. I can query GSI1 withPK=FIELD#A
and do some post-processing (groupby, count
) to get part of the result. I can then query GSI2 withPK=FIELD#A
and post-process, getting the rest of the result. This looks like it will work but requires two GSI and two queries. I can't overload one GSI since both columns in use are from the same item.Some combination of scanning the entire table and filtering the results which I'd rather avoid since there might be 50,000+ items in the entire table.
I can see how to do it with two GSIs, but what's the most efficient way?