7

I'm working on a new project and it was recommended to me that I have a look at Cosmos DB to use for my data store for it because it's going to be a global SaaS application servicing potentially hundreds of thousands of users.

As this is my first foray into NoSQL databases I've inevitably run into a lot of things I have to learn but I feel like I've been slowly getting it and have actually become quite excited about it.

Now here comes the kicker: I have a relatively simple data structure that I just can't seem to wrap my head around how you'd implement and I've pretty much resigned to the fact that it might just be a workload which doesn't really work well in non-relational databases. But as I said I'm completely green so I'm here for a sanity check in case there's something obvious that I'm missing

I have a data structure consisting of a "user" object and a "post" object and I need to relate them based on unbounded arrays of tags to basically create customized feeds they look like this

User:

{
    id: 1,
    name: "username",
    interests: [
        "fishing",
        "photography",
        "stargazing",
    ]
}

Post:

{
    id: 1,
    title: "My post title",
    body: "My post content",
    tags: [
        "tennis",
        "sport",
        "photography",
    ]
}

I want to return a list of all posts which has one or more of a given users interests in their tags so basically a query like this:

SELECT DISTINCT VALUE Posts FROM Posts
JOIN tag IN Posts.Tags
WHERE tag IN <<user.interests>>

In SQL I'd create a table of users and a table of posts and join them based on shared tags/interests but I can't for the life of me figure out how to denormalize this (if possible). Have I really run into one of the impossible flows of NoSQL on my first attempt with only 2 objects? Or am I just being a total noob?

bnm12
  • 83
  • 1
  • 5

1 Answers1

6

You're on the right track! There are two really good articles that describe how to model data for NoSQL. Definitely worth a read if you're new to this type of data store.

Data modeling in Azure Cosmos DB, in particular the section on many-to-many relationships.

and

How to model and partition data on Azure Cosmos DB using a real-world example

Hope you find this helpful.

Mark Brown
  • 8,113
  • 2
  • 17
  • 21
  • I've read both those but they don't really say anything about "how much relation is too much relation in Cosmos" the current model I have I have a hard time seeing how I'd be able to denormalize at all if I want to preserve performance. At this stage it would basically look like someone took a SQL database and put it into Cosmos and I feel like that's very bad? – bnm12 Feb 20 '20 at 14:46
  • The approach is really simple here. You need to test and measure. If it is cheaper to denormalize by using Change Feed and pumping data into a second container than it is to run cross partition queries then you do it. If it is not cheaper, then do not denormalize. Typically higher concurrency queries I use this technique because there is a cost to read from change feed and write each insert/update into another container. btw, one thing to keep in mind is you need to use "soft deletes" for your data because Change Feed does not capture deletes. Hope that helps. – Mark Brown Feb 21 '20 at 00:54
  • @bnm12, were you able to solve your problem? I'm currently facing the same issue and I can't find anything helpful as well from the linked articles. – Robert Mrobo Jul 18 '21 at 22:11