29

The firestore docs don't have an in depth discussion of the tradeoffs involved in using sub-collections vs top-level collections, but do point out that they are less flexible and less 'scalable'. Given that you sacrifice flexibility in setting up your data in sub-collections, there must be some definite plus sides besides a mentally satisfying structure.

For example how does the time for a firestore query on a single key across a large collection compare with getting all items from a much smaller collection?

Say we want to query a large collection 'People' for all people in a family unit. Alternatively, partition the data by family in the first place into family units.

People -> person: {family: 'Smith'}

versus

Families -> family: {name:'Smith'} -> People -> person

I would expect the latter to be more efficient, but is this correct? Are the any big-O estimates for each? Any other advantages of sub-collections (eg for transactions)?

pwray
  • 1,075
  • 1
  • 10
  • 19
  • 1
    So far from what I have seen, there is no benefit of using sub-collections. It gives you much less flexibility when compared to flat top-level collections at the moment. However, I am also super interested in what the planned benefits of sub-collections will be. That might save us a lot of time from hard migration in future. – Ivan Wang Nov 23 '17 at 08:13
  • 1
    Anyone has any thoughts on this? I am trying to decide about storing sub collections or top level. Seems that if you have a collection ref you can query in the same way regardless of where it lies – Felipe Dec 23 '17 at 05:22

4 Answers4

13

I’ ve got some key points about subcollections that you need to be aware of when modeling your database.

1 – Subcollections give you a more structured database.

2 - Queries are indexed by default: Query performance is proportional to the size of your result set, not your data set. So does not matter the size of your collection, the performance depends on the size of your result set.

3 – Each document has a max size of 1MB. For instance, if you have an array of orders in your customer document, it might be a good idea to create a subcollection of orders to each customer because you cannot foresee how many orders a customer will have. By doing this you don’t need to worry about the max size of your document.

4 – Pricing: Firestore charges you for document reads, writes and deletes. Therefore, when you create many subcollections instead of using arrays in the documents, you will need to perform more read, writes and deletes, thus increasing your bill.

  • The chat example in the Firstore documention uses a subcollection structure with a document for each message. I guess that would be pricey. Maybe it would be better to use a nested array? – b-fg Apr 25 '18 at 17:56
  • It can be pricey if you do not use pagination. A nested array would work for a while, but it's not scaleable given the fact that you have a max size of 1MB per document. – Mateus Forgiarini da Silva Apr 28 '18 at 20:48
  • True that. Thanks for the tip. – b-fg Apr 28 '18 at 21:22
  • 13
    The OP asked about top-level vs sub collections. Point 1 is subjective, 2 points out that sub-collections have no benefit, and 3 & 4 are about arrays vs sub-collections. I am honestly wondering about the same thing, but your answer does not point out any clear advantages. 1 might be perceived as an advantage, but sub-collections also have downsides not discussed here. – Thijs Koerselman Aug 16 '19 at 14:14
13

To answer the original question about efficiency:

Querying all people with the family 'Smith' from the people top-level collections really is not any slower than asking for all the people in the 'Smith' family sub-collection.

This is explained in the How to Structure Your Data episode of the Get to Know Cloud Firestore video series.

There are some trade-offs between top-level collections and sub-collections to be aware of. Depending on the specific queries you intend to use you may need to create composite indexes to query top-level collections or collection group indexes to query sub-collections. Both these index types count towards the 200 index exemptions limit.

These trade-offs are discussed in detail near the bottom of the Understanding Collection Group Queries blog post and in Maps, Arrays and Subcollections, Oh My! episode of the Get to Know Cloud Firestore video series.

I've linked to the relevant parts of both videos.

matthew
  • 2,156
  • 5
  • 22
  • 38
  • 3
    The blog post you link to is exactly what OP (and I) was looking for. Like you said, near the bottom it literally discusses the trade-offs, one paragraph even starts with `So when should you store things in a separate top level collection vs. using subcollections?` Thank you!! – steve Apr 26 '20 at 20:42
4

I was wondering about the same thing. The documentation mainly talks about arrays vs sub-collections. My conclusion is that there are no clear advantages of using a sub-collection over a top-level collection. Sub collections had some clear technical limitations before, but I think those are removed with the recent introduction of collection group queries.

Here are some advantages of both approaches:

Sub collection:

  • Your database "feels" more structured as you will have less top-level collections listed.
  • No need to store a reference/foreign key/id of the parent document, as it is implied by the database structure. You can get to the parent via the sub collection document ref.

Top-level collection:

  • Documents are easier to delete. Using sub collections you need to make sure to first delete all sub collection documents before you delete the parent document. There is no API for this so you might need to roll your own helper functions.
  • ~~Having the parent id directly in each (sub) document might make it easier to process query results, depending on the application.~~ In my specific case, I was using abstractions to return documents with their id and data as typed properties, but no handle for ref. Without the ref, I had no way to get to the parent. Later I changed to abstraction to include the ref, so that point is not really valid I guess.

--- edit --- There is also a disadvantage to subcollections I think:

All subcollections end up on one pile based on the name if you do a group query. In this regard the namespace is global, so it is important not to use the same name for different types of sub-collections, or you can get mixed results in a group query.

Thijs Koerselman
  • 21,680
  • 22
  • 74
  • 108
4

Todd answered this in firebase youtube video

enter image description here

1) There's a limit to how many documents you can create per minute in a single collection if the documents have an always-increasing value (like a timestamp)

2) Very large collections don't do as well from a performance standpoint when you're offline. But they are generally good options to consider.

Acid Coder
  • 2,047
  • 15
  • 21