Document Deletion in Azure DocumentDB
Khaled Hikmat
Software EngineerI saw many posts about deleting documents in Azure DocumentDB...but none of them worked quite well for me. So I spent a few hours on this and finally got it to work. Below is my solution. The following posts helped me tremendously (thank you):
- https://talkingaboutdata.wordpress.com/2015/08/24/deleting-multiple-documents-from-azure-documentdb/
- https://www.tutorialspoint.com/documentdb/documentdb_delete_document.htm
- http://stackoverflow.com/questions/29137708/how-to-delete-all-the-documents-in-documentdb-through-c-sharp-code
- https://azure.microsoft.com/en-us/blog/working-with-dates-in-azure-documentdb-4/
I basically wanted to delete aging documents (based on number of hours) from a collection. So my final routine looks like this. Below is some explanation:
#
TimeThe first problem I encountered is how to select the aging documents! It turned out the best way to do this is to compare numbers as opposed to dates. This post helped me understand what the problem is and how to go around doing it properly. I ended it up using the built-in time stamp value stored as meta data in every DocDB document i.e. _ts
. This may or may not work for every case. In my case my collection document date i.e. eventDate
is actually the real UTC time ....so it was no problem. If this is not the case, you many need to store your own time stamp (in addition to the date) so u can do the query to pull the aging documents based on time.
so this query does exactly that:
Notice how I am using the Epoc time for my aging time stamp. The DateTime
extension is written this way:
#
Partition KeyMy collection was partitioned over a value in the document i.e. source
, but I wanted to trim all aging documents across all partitions...not against a single partition. So I used this query options to force the query to span multiple partitions:
#
DeletionFinally, I wanted to loop through all aging documents and delete:
#
QueryPlease note that the query that I used above uses a projection to get only the document link and the partition key....we really do not need the entire document:
Also please note that I am using the VALUE
modifier in the query so to force DocDB to return the value only. This will return a payload that looks like this:
If I don't include the VALUE
modifier, I get this:
I chose the first one :-)
#
DeletionFinally, we pull the documents and delete one at a time:
Initially, I only got the document link from the query thinking that this was the only requirement. So I did something like this:
This did not work! I needed to pass the partition key....this is why i changed the query to a projection so I can get the partition key. In my case the partition key is the source
. There is a comment in this post that gave me a clue that the request option must include the partition key.
Thank you for reading! I hope this helps someone.