6

I have perhaps trillions of string sequences. I'm looking for a fast substring search.

I've created an index. When I am trying to get some results( x => x.StartWith), it takes about 2 sec on a 3 million object database.

How much time it might take on 500 million objects?

Is it possible to have RavenDB search faster?

 store.DatabaseCommands.PutIndex("KeyPhraseInfoByWord", new Raven.Client.Indexes.IndexDefinitionBuilder<KeyPhraseInfo>
   {
    Map = wordStats => from keyPhraseInfo in keyPhraseInfoCollection 
                   select new { keyPhraseInfo.Key },
    Analyzers =
        {
            { x => x.Key, "SimpleAnalyzer"}
        }
    });
Adam Spicer
  • 2,703
  • 25
  • 37
Neir0
  • 12,849
  • 28
  • 83
  • 139
  • 5
    "I have perhaps trillions of string sequences." I've told you a million times about exaggerating. – Simon May 29 '12 at 13:46

2 Answers2

12

Nier0, You can do really fast NGram search using RavenDB, yes. See: https://gist.github.com/1669767

Ayende Rahien
  • 22,925
  • 1
  • 36
  • 41
8

Ayende's excellent NGram analyzer seems to be made for an older version of Lucene than RavenDB uses now, so I made an updated version of it for confused people like me. See: http://pastebin.com/a78XzGDk. All credit goes to Ayende for this one.

To use it, put it in a library, build it and drop it into the Analyzers-folder under Server in the RavenDB directory. Then create an index like this:

public class PostByNameIndex : AbstractIndexCreationTask<Posts>
{
    public PostByNameIndex()
    {
        Map = posts => posts.Select(x => new {x.Name});
        Analyze(x => x.Name, typeof(NGramAnalyzer).AssemblyQualifiedName);
     }
}

But as I said, all credit and thanks to Ayende for creating this.

Eplebit
  • 148
  • 2
  • 4