17

I'm using a new database from Microsoft called DocumentDB. Now I want to delete a document by ID, but I cannot figure out, how to do this. Delete operation in DocumentDB requires self-links and they are different from my own ids. However I am querying once for document, then I will get the self link. With that self link I am deleting the document.

Now I want to delete all documents around 50000+ documents in my collection.

Need to get each document and then delete or any simple method to do the same?

Can it be possible?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
satish kumar V
  • 1,695
  • 6
  • 33
  • 47
  • 2
    (in java) I would like too to delete all document of a given collection. It seems not possible using SQL syntax. For example "DELETE FROM WHERE" is today not available. I vote up for [this feedback](http://feedback.azure.com/forums/263030-documentdb/suggestions/6346033-set-based-operations-insert-update-delete) which is in `under review` state. I don't want to delete an re-create the parent collection: for me it's not an acceptable workaround. I would be really happy to get a solution to do that ... – boly38 Jul 06 '15 at 08:45
  • A link to this question can be seen if you export template of an Azure Cosmos DB instance ! – Sunil Purushothaman Mar 10 '20 at 01:26

3 Answers3

19

You're correct that deleting documents require a reference to the document's _self link.

If you are looking to delete ALL documents in your collection - it may be simpler and faster to delete and re-create the collection. The only caveat is that server-side scripts (e.g. sprocs, udfs, triggers) also belong to the collection and may need to be re-created as well.

Update: I wrote a quick stored procedure that performs a bulk-delete given a query. This allows you to perform bulk delete operations in fewer network requests.

/**
 * A DocumentDB stored procedure that bulk deletes documents for a given query.<br/>
 * Note: You may need to execute this sproc multiple times (depending whether the sproc is able to delete every document within the execution timeout limit).
 *
 * @function
 * @param {string} query - A query that provides the documents to be deleted (e.g. "SELECT * FROM c WHERE c.founded_year = 2008")
 * @returns {Object.<number, boolean>} Returns an object with the two properties:<br/>
 *   deleted - contains a count of documents deleted<br/>
 *   continuation - a boolean whether you should execute the sproc again (true if there are more documents to delete; false otherwise).
 */
function bulkDeleteSproc(query) {
    var collection = getContext().getCollection();
    var collectionLink = collection.getSelfLink();
    var response = getContext().getResponse();
    var responseBody = {
        deleted: 0,
        continuation: true
    };

    // Validate input.
    if (!query) throw new Error("The query is undefined or null.");

    tryQueryAndDelete();

    // Recursively runs the query w/ support for continuation tokens.
    // Calls tryDelete(documents) as soon as the query returns documents.
    function tryQueryAndDelete(continuation) {
        var requestOptions = {continuation: continuation};

        var isAccepted = collection.queryDocuments(collectionLink, query, requestOptions, function (err, retrievedDocs, responseOptions) {
            if (err) throw err;

            if (retrievedDocs.length > 0) {
                // Begin deleting documents as soon as documents are returned form the query results.
                // tryDelete() resumes querying after deleting; no need to page through continuation tokens.
                //  - this is to prioritize writes over reads given timeout constraints.
                tryDelete(retrievedDocs);
            } else if (responseOptions.continuation) {
                // Else if the query came back empty, but with a continuation token; repeat the query w/ the token.
                tryQueryAndDelete(responseOptions.continuation);
            } else {
                // Else if there are no more documents and no continuation token - we are finished deleting documents.
                responseBody.continuation = false;
                response.setBody(responseBody);
            }
        });

        // If we hit execution bounds - return continuation: true.
        if (!isAccepted) {
            response.setBody(responseBody);
        }
    }

    // Recursively deletes documents passed in as an array argument.
    // Attempts to query for more on empty array.
    function tryDelete(documents) {
        if (documents.length > 0) {
            // Delete the first document in the array.
            var isAccepted = collection.deleteDocument(documents[0]._self, {}, function (err, responseOptions) {
                if (err) throw err;

                responseBody.deleted++;
                documents.shift();
                // Delete the next document in the array.
                tryDelete(documents);
            });

            // If we hit execution bounds - return continuation: true.
            if (!isAccepted) {
                response.setBody(responseBody);
            }
        } else {
            // If the document array is empty, query for more documents.
            tryQueryAndDelete();
        }
    }
}
Andrew Liu
  • 8,045
  • 38
  • 47
  • Yeah, Me too thought the same. If I delete the collection, I need to create stored procedures and triggers again. But anyway Thanks to let me know there is no other simple process to delete all records at once :) – satish kumar V Mar 20 '15 at 09:02
  • vote down: no answer for that : "Thanks to let me know there is no other simple process to delete all records at once" – boly38 Jul 06 '15 at 08:46
  • 5
    FYI - voting should be based on whether you get an accurate answer to the question... not whether the software you're using supports a feature. – Andrew Liu Dec 11 '15 at 17:23
  • Hello @andrew-liu. Using the sproc allows me to perform bulk delete ops with fewer network request but that does not affect the charged request units. Am I right? Every execution of the sproc (except the first one) gets throttled in my case and the recommended retry time interval is really high (10 seconds!!!) and has very high request unit costs. It seems to be that the charged request units are equal or even higher with this sproc in contrast to a delete request for each document and because of the throttling I cannot see any performance benefits using this sproc. Really annoying! – Tobias J. Jul 28 '16 at 10:14
  • 1
    @TobiasJ. I agree with your thoughts, maybe you could use the `TimeToLive` functionality I imagine this would be much a cheaper operation to perform. That is an update to each document you want to delete and DocumentDB will clean them up in the background, I believe DocuementDB Doesn't Charge for the Delete, but obviously it will for your update. – SimonGates Oct 07 '16 at 16:55
  • 1
    using store proc only work when your collection is single, if partition i dont know how to do it – Wolf Jan 30 '18 at 03:26
3

Here's a solution for deleting documents using the C# SDK. The code below assumes a single database and a single collection. It will iterate all documents in the collection and delete them one at a time. To delete particular databases, collections, or documents, modify the appropriate "CreateQuery" method to include the SQL select syntax. For example, to select a particular database,

db = client.CreateDatabaseQuery().Where(o => o.Id == "MyDocDb").ToList().First();

Sample Code For Deleting All Documents in a DocumentDB Instance with a Single Database and a Single Collection:

using Microsoft.Azure.Documents;
using Microsoft.Azure.Documents.Client;
using Microsoft.Azure.Documents.Linq;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace Util
{
    class Program
    {
        private Uri _docDbUri = new Uri("https://<nameofyourdocdb>.documents.azure.com:443/");
        private string _docDbKey = "<your primary key>";

        private async Task DeleteDocsAsync()
        {
            using (var client = new DocumentClient(_docDbUri, _docDbKey))
            {
                try
                {
                    var db = client.CreateDatabaseQuery().ToList().First();
                    var coll = client.CreateDocumentCollectionQuery(db.CollectionsLink).ToList().First();
                    var docs = client.CreateDocumentQuery(coll.DocumentsLink);
                    foreach (var doc in docs)
                    {
                        await client.DeleteDocumentAsync(doc.SelfLink);
                    }
                }
                catch (Exception ex)
                {
                    Trace.WriteLine(ex);
                    throw;
                }
            }
        }



        static void Main(string[] args)
        {
            try
            {
                Program p = new Program();
                p.DeleteDocsAsync().Wait();
            }
            catch (Exception)
            {
                throw;
            }
        }
    }
}
Brett
  • 8,575
  • 5
  • 38
  • 51
  • This is method of gettiing all records and delete one by one, Thanks for your answer – satish kumar V Nov 13 '15 at 09:33
  • This method requires a change, the call to DeleteDocumentAsync requires the partition key as part of RequestOptions object. Other than that, this really helped. Thanks. – Yatin Sep 26 '16 at 06:06
  • saving all the task to a list and then await them all will probably save some time. or even every bulk of 100 if the collection is huge. – Ram Y Jul 19 '17 at 13:12
  • I don't believe this will work to delete all documents for a partitioned collection since stored procedures can't run across partitions. – sirdank Dec 17 '18 at 17:38
1

Here is a way to delete documents with AllowBulkExecution enabled. This method deletes documents in chunks of 100, as currently, cosmos DB supports only a max of 100 operations at once. It takes partition key property name and uses reflection to get its value.

public async Task BulkDeleteDocumentsAsync(IEnumerable<T> entities, string partitionKeyPropertyName)
    {
        List<Task> tasks = new List<Task>(100);
        foreach (var entity in entities)
        {
            var partitionKey = entity.GetType().GetProperty(partitionKeyPropertyName)?.GetValue(entity, null)?.ToString();
            if (partitionKey != null)
            {
                tasks.Add(DeleteDocumentAsync(entity.Id!, partitionKey)
                    .ContinueWith(itemResponse =>
                    {
                        if (!itemResponse.IsCompletedSuccessfully)
                        {
                            AggregateException innerExceptions = itemResponse.Exception!.Flatten();

                            if (innerExceptions.InnerExceptions.FirstOrDefault(innerEx => innerEx is CosmosException) is CosmosException cosmosException)
                            {
                                _logger.LogError($"Cosmos Exception deleting {entity.Id} {cosmosException.StatusCode} ({cosmosException.Message}).");
                            }
                            else
                            {
                                _logger.LogError($"Exception deleting {entity.Id} {innerExceptions.InnerExceptions.FirstOrDefault()}.");
                            }
                        }
                    }));
                if (tasks.Count == 100)
                {
                    await Task.WhenAll(tasks);
                    tasks.Clear();
                }
            }
        }
        await Task.WhenAll(tasks);
    }
Roman Svitukha
  • 1,302
  • 1
  • 12
  • 22