4

Possible Duplicate:
Build an index for substring search?

I'm developing a filename search tool. I'd like to search a harddrive (or multiple harddrives) for, perhaps, millions of filenames.

Given the file: application 3 - jack smithinson

Searches:

  1. 'application', '3', 'jack', 'smithinson'
  2. 'smith'
  3. 'inson'

Should all return this file.

What are the best data structures for this kind of operation and why?

  1. Binary tree.
  2. Trie.
  3. SQLite Database, of filenames
  4. More?
Community
  • 1
  • 1
Jason
  • 6,878
  • 5
  • 41
  • 55
  • How are you going to maintain the data structure? How current does the structure need to be with respect to the actual file system contents? – Ted Hopp Jul 28 '11 at 03:45

1 Answers1

8

Store these file names in Lucene indexes. You can find more information here http://incubator.apache.org/lucene.net/ Lucene lets you create highly optimized indexes for search. Yahoo has used it for years for their web search engine. It offers an abstract way to create indexes without worrying about the internal implementation. It's as easy as creating an xml document in memory and then serialize it to disk

Sap
  • 5,197
  • 8
  • 59
  • 101