0

I have a big collection with several thousand documents. These documents have subcollections with documents inside. Now I deleted a lot of the documents on the highest level.

Structure: MyCollection => MyDocument => MySubcollection => MySubdocument

Now I realized, that the files are deleted (not showing up in any query) but the subcollections and their documents still exist. Now I am not sure how I can delete them as well, as I don't know the ID's of my deleted documents.

When I would try to find out the ID's by just sending a query to my collection to read all documents, the deleted ones are (by design) not included anymore. So how can I figure out their IDs now to delete their subcollections?

Thanks for any advice!

progNewbie
  • 4,362
  • 9
  • 48
  • 107

1 Answers1

2

It all depends on your exact goal.

If you want to delete ALL the docs in the MyCollection collection, including ALL the documents in ALL the sub-collection, you can use the Firebase CLI with the following command:

firebase firestore:delete MyCollection -r

Do firebase firestore:delete --help for more options.

Of course, this can only be done by an Owner of your Firebase project.


If you want to allow other users to do the same thing from the front-end (i.e. ALL the docs, including ALL the sub-collections), you can use the technique detailed in the "Delete data with a Callable Cloud Function" section in the doc.

As explained in this doc:

You can take advantage of the firestore:delete command in the Firebase Command Line Interface (CLI). You can import any function of the Firebase CLI into your Node.js application using the firebase-tools package.

The Firebase CLI uses the Cloud Firestore REST API to find all documents under the specified path and delete them individually. This implementation requires no knowledge of your app's specific data hierarchy and will even find and delete "orphaned" documents that no longer have a parent.


If you want to delete ONLY a subset of the documents in the MyCollection collection together with the documents in the sub-collection, you can use the same methods as above, with the path to the documents, e.g.:

firestore:delete MyCollection/MyDocument -r

Finally, if your problem is that you have already deleted "parent" documents and you don't know how to delete the documents in the (orphan) sub-collections (since you don't know the ID of the parent), you could use a Collection Group query to query all the MySubcollection subcollections and detect if the parent document exists or not. The following code, in JavaScript, would do the trick.

  const db = firebase.firestore();
  const parentDocReferences = [];
  const deletedParentDocIds = [];

  db.collectionGroup('MySubcollection')
    .get()
    .then((querySnapshot) => {
      querySnapshot.forEach((doc) => {
        console.log(doc.id);
        console.log(doc.ref.parent.parent.path);
        parentDocReferences.push(db.doc(doc.ref.parent.parent.path).get());
      });
      return Promise.all(parentDocReferences);
    })
    .then((docSnapshots) => {
      docSnapshots.forEach((doc) => {
        console.log(doc.id);
        console.log(doc.exists);
        if (!doc.exists && deletedParentDocIds.indexOf(doc.id) === -1) {
          deletedParentDocIds.push(doc.id);
        }
      });
      
      // Use the deletedParentDocIds array
      // For example, get all orphan subcollections reference in order to delete all the documents in those collections (see https://firebase.google.com/docs/firestore/manage-data/delete-data#collections)
      deletedParentDocIds.forEach(docId => {
         
        const orphanSubCollectionRef = db.collection(`MyCollection/${docId}/MySubcollection`);
        // ...
        
      });
      
    });
Renaud Tarnec
  • 79,263
  • 10
  • 95
  • 121
  • Thanks! The last option is what I need. I just don't really understand why you are storing the parentDocReferences. I don't really need them, do I? The `doc.exists` refers to the parental document? – progNewbie Dec 23 '20 at 16:21
  • If with your question about storing the parentDocReferences you mean "why do you do `parentDocReferences.push()`?" it is because we want to issue an undetermined number of calls to the asynchronous `get()` method, in order to get all the parent documents. We need to know when all these calls are finished, therefore we use `Promise.all()`. – Renaud Tarnec Dec 23 '20 at 16:47
  • `Promise.all()` "returns a single Promise that resolves to an array of the results of the input promises. This returned promise will resolve when all of the input's promises have resolved". See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/all – Renaud Tarnec Dec 23 '20 at 16:47
  • 1
    Thank you for the clarification. This is perfect. Now I just need to implement this in Java :) – progNewbie Dec 23 '20 at 17:05