4

I'm trying to store data in Cosmos DB where the IDs use a slash (/). However slash is an illegal character in Cosmos IDs. I initially tried to resolve this by URL encoding slashes (%2F) as that's the form I'd generally receive them in through API requests. However, though percent (%) is not an illegal character for IDs, Cosmos still chokes on them being unable to retrieve many documents with a percent in the ID (it works for some, but it appears if the % is followed by certain characters it fails).

Is there a encoding that is suitable for Cosmos DB IDs that will replace illegal characters in the original ID text without introducing illegal or unhandled characters (like %) in the encoded ID text? I'd prefer to stay away from things like Base64 which makes the IDs hard to decipher for people. And I'd also like to avoid simple character replacement (/ becomes -) in case an ID uses the replacement character.

Rob Mosher
  • 509
  • 3
  • 13
  • Does this answer your question? [Azure CosmosDB: illegal characters in Document Id](https://stackoverflow.com/questions/57987881/azure-cosmosdb-illegal-characters-in-document-id) – David Makogon Jul 24 '22 at 20:57
  • Please see the related (duplicate) question. Tl;dr no - you cannot use any of the illegal characters in an id. – David Makogon Jul 24 '22 at 21:00
  • I'm not trying to use the illegal characters as is. I'm trying to encode the text to avoid the use of illegal characters. I know which characters are illegal (plus % for some reason). I'm asking is there an encoding that will both replace illegal characters and not introduce illegal characters or unhandled characters like percent. – Rob Mosher Jul 25 '22 at 00:43
  • One option would be to use the base64Url encoding. All characters that are produced by it are allowed and most languages will have an implementation for it. – NotFound Jul 25 '22 at 07:27
  • Hi @RobMosher, I'm experiencing similar behaviour (% is for some reason not specified to be an invalid character, yet I'm seeing unexpected behaviour). Did you find a viable solution other than Base64, which has resolved your problem? – Mr. AJ Sep 03 '22 at 09:33
  • 1
    @Mr.AJ I added a solution which unfortunately isn't general. But it's not too painful when using ValueConverters. The code may not be exactly right since I'm on my phone. – Rob Mosher Sep 06 '22 at 10:09
  • FYI, Cosmos DB will let you IDs with illegal characters, but won't let you easily access or delete them. https://learn.microsoft.com/en-us/answers/questions/600893/cannot-delete-cosmos-db-item-with-illegal-id?orderby=helpful – Rob Mosher Feb 09 '23 at 17:44
  • Should be "let you add IDs with illegal characters" – Rob Mosher Mar 13 '23 at 15:16

1 Answers1

4

I ended up doing simple character replacement, swapping out slashes (/) with pipes (|).

The key thing to make this livable is adding a value converter with EntityFramework.

Expression<Func<string?, string>> toDB = v => v!.Replace("/", "|");
Expression<Func<string, string?>> fromDB = v => v!.Replace("|", "/");
builder.Property(p => p.Id).HasConversion(toDB, fromDB);

This allows the character replacement to happen automatically when reading & writing to the database. The only time you need to worry about the difference is if you're accessing the database directly or from other code without the converter. Or possibly doing custom searches. I manually do the translation for a filtering framework we use, and I suspect that other id search solutions would need the same manual translation.

Ultimately I decided this was acceptable as we are unlikely to have other characters that need translation for our case, the translation is easy to do visually, and it's transparent in most cases with ValueConverters. But it isn't a general solution that would work for any possible string id.

Edit: On second thought, this solution is deficient. Cosmos does actually allow creating documents with illegal characters in the ID, it just doesn't allow accessing or deleting them easily. An ideal solution would prevent all illegal characters across the board, whether expected or not.

Rob Mosher
  • 509
  • 3
  • 13