I am trying to determine the best way to represent data lineage for image processing. I have a images stored in S3 and I want to process them and then place them back in S3. I would then want to be able to run a query so I can see all the images and processes before and after in a chain. For example:
Image1 -ProcessA-> Image2 -ProcessB-> Image3
I would expect a search for the "lineage" of Image2 would yield the above information.
I know this looks like a cookie-cutter case for a graph database but I am not super familiar with them, especially for a production workflow. I have been fighting with how to implement this model in a relational database, but feel like I am just trying to put the square peg in the round hole.
- Is a graph DB the only option? Which flavor would you suggest?
- Is there a way to make this work in a relational model that I have not considered?