I have a table that holds file data in a varbinary(max) column, and am in the middles of converting this to use a FILESTREAM column in order to save the data on the filesystem.
As part of this I have a SQL update script which adds the new FILESTREAM column:
ALTER TABLE [dbo].[File] ADD [Data] VARBINARY(MAX) FILESTREAM NULL
I then migrate the old file data to the new Data column. This saves the file data to disk with a size of, for example, 20GB.
I then alter the column so that it is not null
ALTER TABLE [dbo].[File] ALTER COLUMN [Data] VARBINARY(MAX) FILESTREAM NOT NULL
but this create a new copy of the filestream data on disk (in a new folder within the filegroup location), leaving me with another 20GB copy.
I'm curious as to why the act of altering the column to not null duplicates the data? It would make sense if it was a new column and thus the data was replicated, but with everything else remaining the same it makes little sense to just duplicate the existing data - not to mention the time required to do so for 20GB worth of files.
I can just change the column addition to have a default, which does not cause the duplication:
ALTER TABLE [dbo].[File] ADD [Data] VARBINARY(MAX) FILESTREAM NOT NULL DEFAULT(0x0)
then
ALTER TABLE [dbo].[File] ADD [Data] VARBINARY(MAX) FILESTREAM NOT NULL
but am curious as to why this happens.