Fast text editor find

Question

Does anyone know how text editors/programmers editors are able to do such fast searches on very large text files.

Are they indexing on load, at the start of the find or some other clever technique?

I desperately need a faster implementation of what I have which is a desperately slow walk from top to bottom of the text.

Any ideas are really appreciated.

This is for a C# implementation, but its the technique I'm interested in more than the actual code.

Also, will you have to search multi-lingual text? C# has built-in unicode support, but if you want to get fancy with search algorithms this may have an effect upon your performance. — Elijah, Feb 10 '09 at 12:30

score 6 · Accepted Answer · answered Feb 10 '09 at 09:23

6

Begin with Boyer-Moore search algorithm. It requires some preprocessing (which is fast) and does searching pretty well - especially when searching for long substrings.

answered Feb 10 '09 at 09:23

Anton Gogolev

113,561
39
200
288

score 1 · Answer 2 · answered Feb 10 '09 at 09:36

1

I wouldn't be surprised if most just use the basic, naive search technique (scan for a match on the 1st char, then test if the hit pans out).

answered Feb 10 '09 at 09:36

Michael Burr

333,147
50
533
760

score 1 · Answer 3 · answered Feb 10 '09 at 10:15

1

grep

Although not a text editor in itself, but often called by many text editors. I'm curious if you have you tried grep's source code? It always has seemed blazingly fast to me even when searching large files.

answered Feb 10 '09 at 10:15

Elijah

13,368
10
57
89

"I'm going to beat grep by thirty percent!" http://ridiculousfish.com/blog/archives/2006/05/30/old-age-and-treachery/ – Josh Lee Feb 10 '09 at 14:22

score 0 · Answer 4 · edited Dec 24 '19 at 10:30

One method I know of which is not yet mentioned is the Knuth-Morris-Pratt-Search (KMP), but it isn't so good for language texts (it's due to a prefixed property of the algorithm), but for stuff like DNA matching it is very very good.

Another one is a hash-Search (I don't know if there is an official name). First, you calc a hash value of your pattern and then you make a sliding window (with the size of your pattern) and move it over your text and seeing if the hashes match. The idea here is to choose the hash in a way that you don't have to compute the hash for the complete window but you update your hash just with the next char (and the old char drops out of the hash computation). This algorithm performs very very well when you have multiple strings to search for (because you just compute beforehand your hashes for your strings).

Fast text editor find

4 Answers4

grep