0

I'm trying to determine what might be the best suited (mainly in terms of speed) database service for querying metadata of media content such as images/videos/audio located on AWS S3. Currently I'm looking at DynamoDB and Redshift, but there may be better alternatives I haven't considered.

Example use case:

I have millions of images (and cropped sections of images) ran through a web of machine learning full-image classification, bounding-box object detection, and pixel segmentation (RLE pixel labeled) models, where nested labels are predicted and attributes/scores are assigned. The nested structure is continually evolving. For example, an image may be predicted by a full-image classifier and given the tag "outside", sent to an object detector that detects bounding box locations of multiple "person" tags with x/y/width/height coordinates, then these crops may be sent to a further full (small) image detector that classifies these predicted person crops as "sitting" or "standing". I'd like to be able to speedily query the nested metadata to get the image ID's corresponding to all images with particular combinations of labels.

Specific query example:

What are the S3 locations of all images tagged with the whole-image classification label "outside", with >= two counts of the object detection label "person", and where at least one person object has been further classified as "sitting".

I've been browsing this AWS DB offering page and am not sure what is best suited to this task. Of course, if there's a far superior non-AWS/S3 solution, I'd certainly like to know that. Any suggestions are greatly appreciated!

Edit: Updated the example slightly to describe the nesting structure more clearly.

Austin
  • 6,921
  • 12
  • 73
  • 138
  • 1
    The complexity of the query probably suggest an SQL database. You could star with Amazon RDS and, if it isn't fast enough, move to Amazon Redshift without having to change the querying process itself. (But you'd need to change the data-loading process.) – John Rotenstein Jul 02 '19 at 01:31
  • Thanks for your response. I'm very curious as to the complexity suggesting a SQL database. As a complete novice to DB technology, I began to assume that the evolving nested structure of my metadata potentially precluded the use of a standard SQL DB. If that's not the case then that's great news! – Austin Jul 02 '19 at 01:40
  • 1
    Actually, an alternative would be to use **Elasticsearch**. Fast and powerful, if it suits your use-case. – John Rotenstein Jul 02 '19 at 01:48
  • Thanks, going to do some research on Elasticsearch now. – Austin Jul 02 '19 at 01:53

0 Answers0