2

I'm currently in the analysis phase of a Document Management System development that will be the unique entry point of documents from many applications. The size and amount of documents depends on the application. For example, one of them will have 500,000 documents in a year of which 468,000 have less than 256 KB. The size of documents from other applications could be heterogeneous in size at first glance

The solution I have proposed is to create a WCF web service that acts as a middleware between the external applications and SQL Server so that:

  • When INSERT document
    • The external application would send to the WCF the app GUID and the document as bytes[]
    • The WCF would insert the document into the database and it would return a unique GUID to WCF which in turn would pass it to the external application to be stored on its own DB.
  • When SELECT document
    • The external app would send the document GUID and app GUID to the WCF and the web service queries the DB and return a stream of bytes with the file content

The architecture at a high level would be as follows (Access to data via Entity Framework 6): enter image description here

After considering various alternatives, we choose to use the features of Filestream (as of 2008) and FileTable (as of 2012).

I have doubts from the point of view of DB architecture of this solution and I'm concerned about future performance when millions of documents stored in FileTable (it is the main concern of my client as well). I don't know if this type of feature (Filestream and FileTable) penalizes performance for small files. From what I have read in different blogs of Microsoft, MVPs, etc., they recommend that for small files (less than 256 KB) storage better in table as VARBINARY(MAX) and if the file is 1MB or more, use Filestream.

I have never worked with Filetable and Filestream so I don't know the performance of this approach. Within a year, up to 6 different applications will be dumping documentation in this repository and the expected growth is about 5 million documents in a year whose file sizes are heterogeneous and cannot not be determined at first glance.

For the implementation I have proposed the following:

  1. Logic in the WCF that depending on the size, the document is stored in a VARBINARY or a table FileTable type. I would create 2 Filegroups (one for files <= 1MB and another for > 1MB)
  2. To have as many FileTables as applications and likewise many Filegroups. In the case of having six external applications I would have 6 FileTables each one with its own FILEGROUP but in this case I would not distinguish per file size, but application.

After the above, I would need recommendations on the architecture of this solution and recommendations on characteristics of hard disk, RAM, processor, etc.

Alvmad
  • 155
  • 6

0 Answers0