-1

I'm looking for the fastest and the best algorithm to search some values into a very huge binary file (kind of 2 GB AFP file), wich means that loading the whole data in memory must be inconceivable. I'm working with C# and i don't know if any other programing language (C/C++..) would be really much faster, otherwise i'll continue with C#. Thanks for any ideas.

Peter PAD
  • 2,252
  • 1
  • 17
  • 20

4 Answers4

2

Boyer-Moore offers a good compromise between performance and complexity (and the linked articles has links to other methods.

An implementation in C (source code in link) will be significantly faster than C#, although in practice you'll probably find that disk I/o is the biggest hurdle.

symcbean
  • 47,736
  • 6
  • 59
  • 94
1

After commenting, I decided to provide a possible solution.
Be careful: this solution is not the best nor elegant.
Use it as a starting point:

string SEARCH = @"X'D3A8AF";
int BUFFER = 1024;

int tot = 0;
using (FileStream fs = new FileStream(filename, FileMode.Open))
{
    using (StreamReader sr = new StreamReader(fs))
    {
        char[] buffer = new char[BUFFER];
        int pos = 0;
        while (fs.Position < fs.Length)
        {
            sr.ReadBlock(buffer, 0, BUFFER);
            string s = new string(buffer);
            int i = 0;
            do
            {
                i = s.IndexOf(SEARCH, i);
                if (i >= 0) { tot++; i++; }
            }
            while (i >= 0);
            pos += BUFFER;
            if (!s.EndsWith(SEARCH)) pos -= SEARCH.Length;
            fs.Position = pos;
        }
        sr.Close();
    }
    fs.Close();
}

BUFFER could be modified (increased) as you please.

Marco
  • 56,740
  • 14
  • 129
  • 152
0

You have to load entire file to search the object. If possible split the files based on unique id's if you have. Like split a file for each 100 records (1-100, 101-200, 201-300 etc) based on unique id's or some other params. It is kind of indexing your binary file.

hungryMind
  • 6,931
  • 4
  • 29
  • 45
  • No, he can't load entire file IMHO!! OP could use a StreamReader and read file in chunks. It depends on what he's searching for – Marco Dec 01 '11 at 09:22
0

You can use TextReader.ReadBlock Method. Read the file block by block and look for the requested values. Or even better use BinaryReader.ReadBytes Method.

Ilan Huberman
  • 406
  • 1
  • 3
  • 15