10

According to this or this, I used the same indexsearcher by multiple thread. But when I switched from FsDirectory to MMapDirectory, I got interesting exceptions.

This work fine:

static void Main(string[] args) 
{
    DirectoryInfo directoryInfo = new DirectoryInfo(@"C:\Users\Tams\Desktop\new\");
    var directory = FSDirectory.Open(directoryInfo);
    var indexSearcher = new IndexSearcher(directory);

    const int times = 100;
    const int concurrentTaskCount = 5;
    var task = new Task[concurrentTaskCount];
    for (int i = 0; i < concurrentTaskCount; i++) 
    {
        task[i] = new Task(() => Search(indexSearcher, times));
        task[i].Start();
    }

    Task.WaitAll(task);
}

static void Search(IndexSearcher reader, int times) 
{
    List<Document> docs = new List<Document>(10000);
    for (int i = 0; i < times; i++) 
    {
        var q = new TermQuery(new Term("title", "volume"));
        foreach (var scoreDoc in reader.Search(q, 100).ScoreDocs)
        {
            docs.Add(reader.Doc(scoreDoc.Doc));
        }
    }
}

But with this:

static void Main(string[] args)
 {
    DirectoryInfo directoryInfo = new DirectoryInfo(@"C:\Users\Tams\Desktop\new\");
    var directory = new MMapDirectory(directoryInfo); // CHANGED
    var indexSearcher = new IndexSearcher(directory);

    const int times = 100;
    const int concurrentTaskCount = 5;
    var task = new Task[concurrentTaskCount];
    for (int i = 0; i < concurrentTaskCount; i++)
    {
        task[i] = new Task(() => Search(indexSearcher, times));
        task[i].Start();
    }

    Task.WaitAll(task);
}

static void Search(IndexSearcher reader, int times)
 {
    List<Document> docs = new List<Document>(10000);
    for (int i = 0; i < times; i++) 
   {
        var q = new TermQuery(new Term("title", "volume"));
        foreach (var scoreDoc in reader.Search(q, 100).ScoreDocs)
        {
            docs.Add(reader.Doc(scoreDoc.Doc));
        }
    }
}

I get various exceptions like:

System.ArgumentOutOfRangeException: Index was out of range. Must be non-negative 
                                    and less than the size of the collection.
Parameter name: index
at System.ThrowHelper.ThrowArgumentOutOfRangeException()
at System.Collections.Generic.List`1.get_Item(Int32 index)
at Lucene.Net.Index.FieldInfos.FieldInfo(Int32 fieldNumber)
    in d:\Lucene.Net\FullRepo\trunk\src\core\Index\FieldInfos.cs:line 378   
at Lucene.Net.Index.FieldsReader.Doc(Int32 n, FieldSelector fieldSelector) 
    in d:\Lucene.Net\FullRepo\trunk\src\core\Index\FieldsReader.cs:line 234  
at Lucene.Net.Index.SegmentReader.Document(Int32 n, FieldSelector fieldSelector)
    in d:\Lucene.Net\FullRepo\trunk\src\core\Index\SegmentReader.cs:line 1193
at Lucene.Net.Index.DirectoryReader.Document(Int32 n, FieldSelector fieldSelector)
    in d:\Lucene.Net\FullRepo\trunk\src\core\Index\DirectoryReader.cs:line 686
at Lucene.Net.Index.IndexReader.Document(Int32 n) 
    in d:\Lucene.Net\FullRepo\trunk\src\core\Index\IndexReader.cs:line 732
at Lucene.Net.Search.IndexSearcher.Doc(Int32 i)
    in d:\Lucene.Net\FullRepo\trunk\src\core\Search\IndexSearcher.cs:line 162
at PerformanceTest.Program.Search(IndexSearcher reader, Int32 times)
    in c:\Users\Tams\Documents\Visual Studio 2012\Projects\BookCatalog\PerformanceTest\Program.cs:line 28
at PerformanceTest.Program.<>c__DisplayClass2.<Main>b__0()
    in c:\Users\Tams\Documents\Visual Studio 2012\Projects\BookCatalog\PerformanceTest\Program.cs:line 43
at System.Threading.Tasks.Task.InnerInvoke()
at System.Threading.Tasks.Task.Execute()

Or

System.IO.IOException: read past EOF
at Lucene.Net.Store.BufferedIndexInput.Refill()
    in d:\Lucene.Net\FullRepo\trunk\src\core\Store\BufferedIndexInput.cs:line 179
at Lucene.Net.Store.BufferedIndexInput.ReadByte()
    in d:\Lucene.Net\FullRepo\trunk\src\core\Store\BufferedIndexInput.cs:line 41
at Lucene.Net.Store.IndexInput.ReadVInt()
    in d:\Lucene.Net\FullRepo\trunk\src\core\Store\IndexInput.cs:line 88   
at Lucene.Net.Index.FieldsReader.Doc(Int32 n, FieldSelector fieldSelector)
    in d:\Lucene.Net\FullRepo\trunk\src\core\Index\FieldsReader.cs:line 230  
at Lucene.Net.Index.SegmentReader.Document(Int32 n, FieldSelector fieldSelector)
    in d:\Lucene.Net\FullRepo\trunk\src\core\Index\SegmentReader.cs:line 1193
at Lucene.Net.Index.DirectoryReader.Document(Int32 n, FieldSelector fieldSelector)
    in d:\Lucene.Net\FullRepo\trunk\src\core\Index\DirectoryReader.cs:line 686
at Lucene.Net.Index.IndexReader.Document(Int32 n)
    in d:\Lucene.Net\FullRepo\trunk\src\core\Index\IndexReader.cs:line 732   
at Lucene.Net.Search.IndexSearcher.Doc(Int32 i)
    in d:\Lucene.Net\FullRepo\trunk\src\core\Search\IndexSearcher.cs:line 162
at PerformanceTest.Program.Search(IndexSearcher reader, Int32 times)
    in c:\Users\Tams\Documents\Visual Studio 2012\Projects\BookCatalog\PerformanceTest\Program.cs:line 28
at PerformanceTest.Program.<>c__DisplayClass2.<Main>b__0()
    in c:\Users\Tams\Documents\Visual Studio 2012\Projects\BookCatalog\PerformanceTest\Program.cs:line 43
at System.Threading.Tasks.Task.InnerInvoke()
at System.Threading.Tasks.Task.Execute()

The last code work fine, with setting the concurrentTaskCount variable to 1.

Am I missing something? I cant figure out what that is.

Actually, I dont have the path

d:\Lucene.Net\FullRepo\trunk\src\core\Store\BufferedIndexInput.cs

I don't even have a drive with letter "d"

Albireo
  • 10,977
  • 13
  • 62
  • 96
Tamás Varga
  • 635
  • 5
  • 17
  • 1
    The path mentioned in the exception stacktrace comes from the machine that built the binary, not your machine. – sisve May 01 '13 at 11:01
  • If you think you have found a concurrency bug within the MMapDirectory .Net implementation, you should report it to the Lucene.net projet offical bugtracking system – Jf Beaulac May 01 '13 at 13:51
  • @JfBeaulac He doesn't know if it is a bug (this was posted to the Lucene.NET mailing list), hence the posting here. – casperOne May 02 '13 at 19:14
  • Exactly casperOne. Actually, I think its a bug, the only thing I've done is to switch from one directory implementation to the other. BTW: is there any answer on the list?? (I didn't receive any, but i subscribed to it). – Tamás Varga May 02 '13 at 20:33

1 Answers1

3

The source for MMapDirectory shows that this class does not use memory-mapped files, as expected. It loads all index files into memory using MemoryStream objects, and I would guess that those streams are the cause of the problem when different threads seeks and reads.

You can get a memory-based index by loading it into a RAMDirectory. This passes your test. (But it does what MMapDirectory currently does, not necessarily what you expect it to do...)

var fsDirectory = FSDirectory.Open(directoryInfo);
var directory = new RAMDirectory(fsDirectory);
sisve
  • 19,501
  • 3
  • 53
  • 95
  • Of course it doesn't. Its a port from Java, where mmapfiles doesn't even exist as a type. If you look at the Java source, that does time same thing. FsDirectory implementation is slow for bigger indexes, RAMDirectory would be great, but my index is much bigger than the size of available memory. Even it would be smaller, you would still suffer from GC stops. – Tamás Varga May 04 '13 at 12:38
  • Java has FileChannel.map which "maps a region of this channel's file directly into memory". You can find the call in the MMapIndexInput constructor. This matches the MemoryMappedFile.CreateViewStream method available in .NET 4, but the port does not use memory-mapped files (which I expected it to do, based on the name). – sisve May 04 '13 at 12:52
  • 1
    Thats odd, it would be interesting to know why its implemented like that in 3.x, it used to be supported in 2.9.4g via a support class: http://svn.apache.org/repos/asf/lucene.net/tags/Lucene.Net_2_9_4g_RC1/src/core/Support/MemoryMappedDirectory.cs – Jf Beaulac May 04 '13 at 16:16