3

I'm uploading some files with some different special characters to the blob. It is not getting uploaded. I found that there is some restriction on naming the files of the azure. So I need the list of unsupported unicode characters for blob file names or way to find whether a character is supported in azure blob file name or not.

I had referred below doc on this. They didnt provide any particular list or way to find it. https://learn.microsoft.com/en-us/rest/api/storageservices/naming-and-referencing-shares--directories--files--and-metadata

I need the exact validation of file name validation happening on upload file blade on azure blob

Dinesh Kumar
  • 67
  • 1
  • 2
  • 10
  • 1
    what do you mean, it describes possibilities pretty thoroughly? – 4c74356b41 Feb 21 '19 at 11:10
  • They are mentioning "In addition, some ASCII or Unicode characters, like control characters (0x00 to 0x1F, \u0081, etc.)" ETC means there are some more characters right. I'm writing a tool that will detect the unsupported character of the file name. So I required complete set of unsupported character or full set of supported chaarcter. – Dinesh Kumar Feb 21 '19 at 11:14
  • oh, I see what you mean, pretty much anything with the exception of `[a-zA-Z0-9-]`, i dont have a comprehensive list though. – 4c74356b41 Feb 21 '19 at 11:15
  • We need to include supported special characters also. Yes but there are some other language symbols which isnt supported. – Dinesh Kumar Feb 21 '19 at 11:37

3 Answers3

4

I don't think the Microsoft Docs are very precisely specified.

A blob name must conforming to the following naming rules:

  • A blob name can contain any combination of characters.
  • A blob name must be at least one character long and cannot be more than 1,024 characters long, for blobs in Azure Storage.
  • Blob names are case-sensitive.
  • Reserved URL characters must be properly escaped.
  • The number of path segments comprising the blob name cannot exceed 254. A path segment is the string between consecutive delimiter characters (e.g., the forward slash '/') that corresponds to the name of a virtual directory.

In my tests I found you cannot have these characters in an Azure Blob name

  • Control characters 0x00-0x1F
  • Delete 0x7F
  • Backslash '\' - Azure converts this to forward slash '/'
  • Names ending in full stop '.'

I used the Azure Blob go SDK for doing these tests, so it is possible some of these limitations are due to that.

Nick Craig-Wood
  • 52,955
  • 12
  • 126
  • 132
0

Here is the correct document: https://learn.microsoft.com/en-us/rest/api/storageservices/naming-and-referencing-containers--blobs--and-metadata#blob-names

A blob name must conforming to the following naming rules:

  • A blob name can contain any combination of characters.

  • A blob name must be at least one character long and cannot be more than 1,024 characters long, for blobs in Azure Storage.

    The Azure Storage emulator supports blob names up to 256 characters long. For more information, see Use the Azure storage emulator for development and testing.

  • Blob names are case-sensitive.

  • Reserved URL characters must be properly escaped.

  • The number of path segments comprising the blob name cannot exceed 254. A path segment is the string between consecutive delimiter characters (e.g., the forward slash '/') that corresponds to the name of a virtual directory.

Note: Avoid blob names that end with a dot (.), a forward slash (/), or a sequence or combination of the two.

The Blob service is based on a flat storage scheme, not a hierarchical scheme. However, you may specify a character or string delimiter within a blob name to create a virtual hierarchy. For example, the following list shows valid and unique blob names. Notice that a string can be valid as both a blob name and as a virtual directory name in the same container:

  • /a

  • /a.txt

  • /a/b

  • /a/b.txt

You can take advantage of the delimiter character when enumerating blobs.

Note: the doc that was mentioned in your question is for Azure File Storage rather than Azure Blob Storage, so it's not the correct one.

Zhaoxing Lu
  • 6,319
  • 18
  • 41
  • You are right with the document. But still it didnt cover my question. Consider the file name as 'test.␍簴ꊨ簷'. We are not able to copy this file to azure portal due to some unsupported character. So I need find either all acceptable characters or un supported characters for blobs. My issue is same as this mentioned in this link https://social.msdn.microsoft.com/Forums/azure/en-US/56cdf864-5327-4944-a61a-3d237bbcf899/blob-service-doesnt-accept-special-characters?forum=windowsazuredata – Dinesh Kumar Feb 22 '19 at 05:29
  • @DineshKumar from that link it's clear that the restriction is based on the URI standard. RFC 1738 is further referenced. Any filename that would be an illegal as part of an URI, is presumably illegal as a blobname as well. – Svend Mar 01 '19 at 10:39
  • @Svend It doesnt seems to work on that way. It is not accepting character like '\u007f' which is supported in ntfs and also some other special characters are also not supported. – Dinesh Kumar Mar 01 '19 at 10:45
  • @DineshKumar What NTFS handles it irrelevant. `\u007f` is the delete control character. From the RFC 1738 which is mentioned in your links, I find this: `URLs are written only with the graphic printable characters of the US-ASCII coded character set. The octets 80-FF hexadecimal are not used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent control characters; these must be encoded` – Svend Mar 01 '19 at 10:59
  • @Svend Can you please share me the document link where the RFC 1738 had referred. It will be helpful for me to analyse it. – Dinesh Kumar Mar 08 '19 at 06:08
  • This seems self-contradictory to me. it says "Reserved URL characters must be properly escaped" then says that "/a/b.txt" is a valid blob name, when according to https://tools.ietf.org/html/rfc3986#section-2.2, "/" is a reserved character – Andy May 13 '20 at 19:50
  • Another thing I've discovered is that you must encode a space as %20. if you encode it as + (which is what most URL encoders will do) or leave it as a space, you will get a + back when you call ListBlobs – Andy May 14 '20 at 20:24