2

I have am building collection that will contain over a million documents. Each document will contain one token and a history table. A process retrieves a token, it stores the process id in the history table inside the document so it can never be used again by the same process. The tokens are reusable by different processes. I want to make each process pull a document/token and never be able to pull that same document/token again.

My approach is to have a stored history table in each document with the processes that have used the token. That is why you need to query for what is not in the array.

Firestore does not have a condition where you can search for what is not in an array. How would I perform a query like such below where array-does-not-contain being a placeholder to search through an array where 'process-001' is not in the history array?

db.collection('tokens').where('history', 'array-does-not-contain', 'process-001').limit(1).get();

Below is how I'm planning to structure my collection,

My actual problem, I have a multiple processes running and I only want each process to pull documents from firebase that it's never seen before. The firebase collection will be over a million documents and growing.

C O
  • 326
  • 1
  • 4
  • 11
  • 1
    I'm not clear on what the problem is here. You'll need to figure out what the actual queries are going to be, not using a fictional placeholder query. If Firestore allows the query, then it will be efficient. – Doug Stevenson Nov 17 '19 at 18:11
  • Sheesh. Someone is quick to downvote, it seemed I was pretty clear. Since firebase does not let you filter by what is not in an array, what is the best approach to structure and query to allow you to do something like that? – C O Nov 17 '19 at 18:32
  • Firestore can't efficiently index non-existent items, so `not in`, `!=`, and `array-does-not-contain` style queries are all [not supported](https://firebase.google.com/docs/firestore/query-data/queries#query_limitations). Thus, what you are trying to do will be a challenge to do efficiently in firestore. Moreover, design-style questions tend to do poorly on SO, as they can be opinion based or lead to a large amount of discussion to explore the constraint space -- which is really the only option for your problem. – robsiemb Nov 17 '19 at 19:20
  • Agreed with Doug here. You're asking about your perceived solution (a "does not contain" query) of which you already know it doesn't exist. If there was an efficient way to do that, Firestore would've already implemented it. So we're dealing with an [XY problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). Instead of describing your perceived solution, describe the actual use-case that you're trying to achieve. – Frank van Puffelen Nov 17 '19 at 19:38
  • @FrankvanPuffelen I've edited my original question to be more clear about my problem. Thanks again for pointing it out. I proposed solution first because it doesn't sit well with other members if you don't show you've made an attempt at a solution. – C O Nov 17 '19 at 20:20

1 Answers1

0

Firestore is not very well suited for queries that need to look for things that don't exist. The problem is that the indexes it uses are only meant to tell you if things exist. The universe of strings that don't exist would be impossible to efficiently quantify for indexing.

The only want to make this happen is to know the names of all the processes ahead of time, and create values for them in the index. You would do this with a map type object, not an array:

- token: "1234"
- history: {
    "process-001": false,
    "process-002": false,
    "process-003": false
  }

This document can be queried to find out if "history.process-001" has a value of false, then updated to true when the process uses it. But again, without all the process names known ahead of time and populated in each document, the query is not possible.

See also:

Doug Stevenson
  • 297,357
  • 32
  • 422
  • 441
  • This is one of the solutions I've come up with but I cannot have process names ahead of time because I want to be able to create new processes as I move forward. Unless I can update each document to have the latest process list? It would be iterating through a collection of over a million documents though. I would only have at most one or two new process name a month. – C O Nov 17 '19 at 20:41
  • Sure, you could do that. – Doug Stevenson Nov 17 '19 at 20:42
  • Would that be an okay practice? I'm worried about the amount of time it will take to index after creating a new field in the mapping object. – C O Nov 17 '19 at 21:08
  • Not really an issue. – Doug Stevenson Nov 17 '19 at 21:10
  • If you have a moment, please mark this answer as correct if if was helpful. – Doug Stevenson Nov 18 '19 at 15:31
  • For anyone approaching this solution and is wondering about index time like I was, it took about just under an hour to index 700k records. The history array is about 10 elements large. – C O Nov 23 '19 at 21:46