3

I have been prototyping an approach to integrating Lucene.NET with SQL Server using SQL Server FileTable. The usage is very convenient and the code is simple - I don't have to do anything special with customizing Lucene.NET. The advantage I seek is mostly operational and enterprisey - my Company operates SQL Server 24/7 and keeping the search index in the same control space has a lot of advantages for us (...and I do realize I won't get precise transactional consistency, that is OK).

Problem: No matter what I do, there seems to be approximately 200 milliseconds (+- 20-30ms) of overhead when writing to a file in the SQL FileTable UNC share using the WinAPI (via System.IO.FileStream). This is significant with Lucene.NET because an index write operation directly to my local file system takes ~50ms, whereas the same operation to the FileTable takes ~2-3 seconds!

To sanity check this, I created another experiment that writes 3 new (create) files at 10KB, 1MB and 10MB. I wrote these 3 files to:

  1. A local directory (c:\Search\\)
  2. A non-FileTable share via UNC Path (\\\127.0.0.1\\Search\\)
  3. A FileTable UNC path (\\\127.0.0.1\[instance-share]\\search_index\\)

Using System.Diagnostics.Stopwatch, writing to the local directory was fastest as expected, writing to the non-FileTable share was slower but comparable, and the FileTable was an order of magnitude slower. What was interesting is the 2 larger file sizes performed similarly in case 2 and 3, which leads me to believe the overhead with file creation levels the timings.

Question: Does anyone have deeper insight into why file creation with FileTable is so "slow"?

This is a development VM with little concurrent activity (4GB RAM, 2 vCPU, probably some IO contention but these tests are meant to be relative comparisons). Insert into SQL Server for a trivial row on this box barely hits 1ms.

I don't have the code handy but will post with an edit soon (with exact timings) - it is very simple and just writes 4K chunks of a statically initialized byte array in a loop to the desired size.

I did implement the following recommendations, and also tuned the SMB stack, with no difference in performance: http://blogs.msdn.com/b/blogdoezequiel/archive/2011/02/11/best-practices-on-filestream-implementations.aspx#.UkbEYtKshcZ

EDIT: Timings from output test console:

Writing files for directory: c:\Search
        Writing file size : 10240
        Writing file size : 1048576
        Writing file size : 10485760
Writing files for directory: \\127.0.0.1\Search
        Writing file size : 10240
        Writing file size : 1048576
        Writing file size : 10485760
Writing files for directory: \\127.0.0.1\Sql2012\FIndex\search_index
        Writing file size : 10240
        Writing file size : 1048576
        Writing file size : 10485760

Write Timings
---------------------------------------------------------------
Paths (rows): Local, Shared, SQL File Table
Sizes (columns): 10KB, 1MB, 10MB
---------------------------------------------------------------
Local:  3                       2                       17
Share:  28                      31                      64
FTable: 205                     249                     317

Source Code (very simple, posted for completeness): Console Main with ASCII art omitted:

private static readonly string[] paths = new string[] 
{
    @"c:\Search",
    @"\\127.0.0.1\Search",
    @"\\127.0.0.1\Sql2012\FIndex\search_index"
};

private static readonly int[] sizes = new int[] 
{
    1024 * 10,
    1024 * 1024,
    1024 * 1024 * 10
};

static void Main(string[] args)
{
    // Directory: Size 1, 2, 3
    var timings = new long[3, 3];
    var stopwatch = new Stopwatch();
    for(var x = 0; x < 3; x++)
    {
        Console.WriteLine("Writing files for directory: {0}", paths[x]);
        for(var y = 0; y < 3; y++)
        {
            Console.WriteLine("\tWriting file size : {0}", sizes[y]);
            string fileName = Path.Combine(paths[x], Guid.NewGuid().ToString() + ".bin");
            stopwatch.Start();
            FileIOTestHelper.WriteFile(fileName, sizes[y]);
            stopwatch.Stop();
            timings[x, y] = stopwatch.ElapsedMilliseconds;
            stopwatch.Reset();
        }
    }

// ascii art display code
}

Implementation class:

public static class FileIOTestHelper
{
    private static readonly byte[] testBuffer = CreateBuffer();

    private static byte[] CreateBuffer()
    {
        var buffer = new byte[4096];
        for (var i = 0; i < 4096; i++)
        {
            buffer[i] = (byte)(i % 256);
        }

        ForceIOJit(buffer);
        return buffer;
    }

    private static void ForceIOJit(byte[] initBuffer)
    {
        // Shouldn't matter, but eliminating any possible warm up cost.
        using (var fs = new FileStream(Path.GetTempFileName(), FileMode.Open))
        {
            fs.Write(initBuffer, 0, 4096);
            fs.Flush();
        }
    }

    public static void WriteFile(string name, int sizeInBytes)
    {
        var count = sizeInBytes / 4096;
        var remainder = sizeInBytes % 4096;
        using (var fs = new FileStream(name, FileMode.Create))
        {
            for (int i = 0; i < count; i++)
            {
                fs.Write(testBuffer, 0, 4096);
            }

            if (remainder > 0)
            {
                fs.Write(testBuffer, 0, remainder);
            }   

            fs.Flush();
        }
    }
}
bkrakower
  • 83
  • 1
  • 8
  • An associate of mine who is a consultant with MCS and knows SQL Server well did some experimentation on this problem and found the performance of FileTable sensitive to the FileStream buffer size. It seems to perform best at 64KB. I'll post an edit with the details. – bkrakower Oct 18 '13 at 00:36

1 Answers1

1

After a lot more testing, the latency I am experiencing is consistent and not sensitive to the buffer size. I am accepting this as a limitation of FileTable and it being less than ideal for very chatty IO.

bkrakower
  • 83
  • 1
  • 8
  • No, nothing on this front. Using Elasticsearch :>. – bkrakower Feb 22 '18 at 15:23
  • I'm currently using SQL SERVER 2016 and am dropping thousands of xml files into the filestream directory. I'm not impressed on how long it takes to insert into the filestream directory. Dropping the physical XML or using the openquery insert of xml records via SQL makes no difference. It seems to copy at a rate of one xml file per seconds. That is pretty bad. I'm experimenting with using a single zip file for unzipping in the directory. I'm also trying halting population on the table while I introduce new files. So far, not looking so good. – Rudy Hinojosa May 30 '18 at 16:06