As js441 pointed out Apache Lucene is a good option but only if you are going to do term based search, similar to how google works. If you need to search arbitrary strings that span the terms Lucene will not help you.
In the later case you are right, you have to build some sort of suffix tree. A neat trick you can do after you have built a suffix tree is to write it to the file and mmap it into memory space. This way you will not waste memory to keep entire tree in RAM, but you will have frequently accessed portions of the tree automatically cached. The drawback to mmap is that initial searches might be somewhat slow. Also this will not if your files change often.
To help the case of searching just edited files, you can keep two indices, one for the bulk of your files and another one just for the recently edited files. So when you do the search you will search in both indices. Periodically you should rebuild the permanent index with the contents of the new files and replace the old one.
Here are some examples of when Lucene is good and when suffix tree is good:
Assume you have a document that contains the following:
A quick brown dog has jumped over lazy fox.
Lucene is good for the following searches:
- quick
- quick brown
- q*
q* b
With some tricks you can make the following searches work well:
'*ick *own'
This type of search will run very slow
'q*ick brown d*g'
And this type of search will never find anything
"ick brown d"
Lucene is also good when you treat your documents as bags of words. So you can easily do searches like this
quick fox
Which will find you all documents that have words quick and fox no matter what is in the middle.
On the other hand suffix trees work well with search for exact matches of substrings within the document, even in cases when your search is spans the terms and starts and ends in the middle of the term.
Very good algorithm for constructing suffix trees of large arrays is described here (Warnign paywalled).