13

I am looking at FILESTREAM attribute in SQL Server to store files in it. I understand it stores the files on hard drive and stores the file pointer/path information in DB. Also, maintains transactional consistency in the process.

There also seems to be a limitation "FILESTREAM data can be stored only on local disk volumes" for the FILESTREAM attribute.

If i anticipate my web app to store 200,000 images of 1-2mb each, i would require around 200gb of hard drive space to store the images. Since, the FILESTREAM requires all data to be stored only on local disk as per the limitation, it would be impossible to store millions of files on a single hard drive, as the storage requirements would be extremely large.

Is my understanding of the limitation correct or am i missing anything here?

If this limitation is correct, i would instead store it in db as plain blob and cluster my DB for increase in storage requirements, which doesn't seem to be possible with FILESTREAM.

Please share your thoughts!

UPDATED:
Few more questions regarding FILESTREAM:-

  1. How to handle data recovery in case of data container corruption?
  2. Can we just backup the DB without the file system data? [assuming data is in SAN, which need not be moved]
  3. I would like to back up or restore the DB and just remap the filegroup path information [that maps to SAN]. Is this possible?
pencilslate
  • 12,958
  • 18
  • 58
  • 73

4 Answers4

19

FILESTREAM does not actually require local storage, just not SMB network storage. An iSCSI or Fiber Channel SAN works fine to store FILESTREAM data. You can also have multiple filestream file groups per table, essentially partitioning your data. If you are strictly targeting sql server 2008 there is very little reason to not use filestream for large binary data. There is a Microsoft whitepaper describing filestream partitioning here.

Jeff Mc
  • 3,723
  • 1
  • 22
  • 27
4

On the local disk volume requirement

Do not take local too literally. While it is indeed a requirement that MSSQL should "see" the filegroup(s) associated with FILESTREAM data as local drives, this storage is often supplied by way of NAS or other storage technologies which trick Windows into thinking these are local NTFS disks (by way of iSCSI and such). This is particularly true with enterprise applications, with the level of space requirement you mention.

On using FILESTREAM at all...

Do weigh the pros and cons carefully. Your question mentions rather big (MB-size) images (I'm assuming graphic images, not logic images of sorts), which implies a rather atomic use of them. A file server setup would require external (to SQL server) management and synchronization, but this seems to be a relatively small cost to pay to keep your freedom, not so much vis-a-vis SQL Server / Microsoft, but also your ability to move things around more easily for scaling / bandwidth purposes.

CJ Dennis
  • 4,226
  • 2
  • 40
  • 69
mjv
  • 73,152
  • 14
  • 113
  • 156
  • 1
    @mjv: The freedom to move things around is the chief concern. What would happen during data container corruption? Ability to just backup the DB alone and later remap the filegroup path? these are few more questions that propped up based on your explanation.. – pencilslate Sep 14 '09 at 17:29
  • 1
    @pencilslate: SQL server is effectively managing the FILESTREAM (FS) datastore(s), so the backup for the FS stores is part of the SQL backup/recovery model. One can explicitly exclude the FS-related storage locations from the regular SQL backup and manage this backup externally; doing so tends to defeat the purpose, so one has to choose between ridiculously big backup/restore or manual management of separate recovery plans... So, unless there are compelling benefits to integrating the two data genres, a fully external repository system may just be preferable. – mjv Sep 14 '09 at 18:02
  • 1
    [cont.] With the non-FS solution a possible recovery strategy for the FS-type data is to have two online repositories, in distinct physical locations. These repositories are updated in parallel, minimizing the need for frequent "tape" backup. The secondary repository not only serves as backup, but as a stand-by server. This is particular interesting when the data stored are images, pdfs and other content that compress poorly, and therefore a similar amount of storage is required for formal backup or this mirror setup. – mjv Sep 14 '09 at 18:12
  • 1
    Freedom is overrated. I will gladly slave to Microsoft if they keep providing me with quality products. I will take a fully MS centric environment any day over any mish mash of php, mysql, java and co. There is no point in trying to avoid the unavoidable, if you are working with SQL server in a business environment the only other DBMS you can really make a point moving to is Oracle. – Drunken Code Monkey Jun 21 '17 at 05:34
  • Keep in mind you can also make the files available over a share via an option in the instance. You are not really locked in you can copy the files out like any other file system, or reindex them into another database. – Drunken Code Monkey Jun 21 '17 at 05:36
2

Using a SQL Cluster doesn't give you any additional storage availability as clustering requires SAN storage. You can simply create a LUN or LUNs for use as FILESTREAM storage on a nonclustered instance as well.

mrdenny
  • 4,990
  • 2
  • 21
  • 28
  • @mrdenny: Can i just backup the db alone and remap the LUNs after db restore, thereby avoiding the need to backup filesystem data? – pencilslate Sep 14 '09 at 17:31
  • If your are using the FILESTREAM, then when you back up the database the files would also be backed up as well. – mrdenny Sep 14 '09 at 17:41
1

Step By Step implementation of local filestream in sql server 2008

Configure filestream in sql server :

  1. First Go to SQL server configuration manage.
  2. Right click on QL server(SQLEXPRESS) and select properties.
  3. Select filestream tab and enable filestream.

Execute following script in SQL server 2008 :

EXEC sp_configure filestream_access_level, 2 RECONFIGURE

Create Database for filestream :

CREATE DATABASE MyFsDb 
ON
PRIMARY ( NAME = MyFsDat,
    FILENAME = 'c:\data\myfsdat.mdf'),
FILEGROUP MyFsGroup CONTAINS FILESTREAM( NAME = MyFs,
    FILENAME = 'c:\data\myfs1')
LOG ON  ( NAME = MyFsLog,
    FILENAME = 'c:\data\myfslog.ldf')
GO

Create Table :

CREATE TABLE MyFsTable
(
  fId INT IDENTITY PRIMARY KEY,
  fData VARBINARY(MAX) FILESTREAM  NULL,
  fName NVARCHAR(300),
  RowGuid UNIQUEIDENTIFIER  NOT NULL  ROWGUIDCOL UNIQUE DEFAULT NEWID()
)

Procedure to add data in table :

ALTER PROCEDURE [dbo].[uspAddFile]

@fData VARBINARY(Max),
@ fName varchar(50),

AS
BEGIN
INSERT INTO MyFsTable (fData, fName, RowGuid) VALUES (@Item, @ItemName, DEFAULT)
END

Lets add some data in table from front end using C#:

Public void AddFile()
{
string connectionString = System.Configuration.ConfigurationManager.ConnectionStrings["connectionstring"].ToString();
                con = new System.Data.SqlClient.SqlConnection(connectionString);
                cmd = new System.Data.SqlClient.SqlCommand("uspAddFile", con);
                cmd.CommandType = CommandType.StoredProcedure;
                cmd.Parameters.Add("@fData", SqlDbType.Binary).Value = GetByte(TempPath);
                cmd.Parameters.Add("@fName", SqlDbType.VarChar).Value = tempFile;
                con.Open();
                result = cmd.ExecuteNonQuery();
                con.Close();
}
Sneftel
  • 40,271
  • 12
  • 71
  • 104
Asif
  • 11
  • 1