0

For work I am tasked with finding out how many times a certain set of sequential characters are used in a string. These files are all 1GB + in size and there are anywhere from 4 to 15 files like these. ex. find "cat" in "catastrophe" in every instance "cat" is part of the word in every file.

In theory (at least to me) I would load one text file into memory and then line by line look for the match. At the end of the text file I would remove it from memory and load the next text file.... until all files have been searched.

I have been doing mostly script files to automate tasks these last few years and I have been out of the coding game for so long, I don't remember, or maybe I never knew, the most efficient and fastest way to do this.

When I say speed I mean in elapsed time of the program, not how long it will take me to write this.

I would like to do this in C# because I am trying to get more comfortable with the language but really I could do it in any language. Preferably not assembly ...that was a joke...

  • 2
    You probably don't want to _"load one text file into memory"_. Instead, stream the file through your program (either line by line, or, likely better, block by block: though block-by -block analysis is harder (consider if "c" ends one block and "at" starts the next). Once you create arrays larger than 85kb, you end up fighting LOH and Gen2 garbage collection issues – Flydog57 Oct 16 '21 at 04:37
  • Load em all into a decent DB such that you can take advantage of a full text indexing strategy. No point reinventing something (e.g) Microsoft poured hundreds of thousands of dollars into – Caius Jard Oct 16 '21 at 07:00
  • gotcha...makes sense – Christian Williams Oct 16 '21 at 07:18
  • Does this answer your question? [Using StreamReader to check if a file contains a string](https://stackoverflow.com/questions/6183809/using-streamreader-to-check-if-a-file-contains-a-string) You want the second answer – Charlieface Oct 16 '21 at 19:28
  • If I were to write it myself (as a side project...for fun, cause I actually really miss programming) how would I go about it? Not asking for anyone to write it for me dont do that! haha Just give me a basic overview of the process. I have 2 text files of 70gb each. Each string separated by a line break. I would like to be able to load both files and search. Would I go line by line? Would I load a part of it into memory and the remove it and load another part? I should be fine with the syntax but the local part of it as to how i handle the huge files and how to make it efficient. – Christian Williams Oct 18 '21 at 02:57

0 Answers0