I have a list of files names (nearly 400 000). I need to parse each file's content and find a given string pattern.
Can any one help me best way to boost my searching process(I'm able to process the content in 90 seconds).
Here is the piece of code that need to be optimised.
/**
* This method is called over a list of files and file is parsed char by char and compared with pattern using prefix table( used in KMP algorithm).
*
* @param pattern
* Pattern to be searched
*
* @param prefixTable
* Prefix table is build is using KMP algorithm.
* Example:- For a given pattern => results sets are { "ababaca" => 0012301, "abcdabca" => 00001231, "aababca" => 0101001, "aabaabaaa" => 010123452 }
*
* @param file
* File that need to be parsed to find the string pattern.
*
* @@return
* For a given file it return a map of lines numbers with all multiple char location(start) of pattern with in that line.
*
*/
def contains(pattern:Array[Char],prefixTable:Array[Int], file:String):LinkedHashMap[Integer, ArrayList[Integer]]= {
val pat:String = pattern.toString()
//stores a line and char location of each occurrence
var returnValue:LinkedHashMap[Integer, ArrayList[Integer]] = new LinkedHashMap[Integer, ArrayList[Integer]]()
val source = scala.io.Source.fromFile(file,"iso-8859-1")
val lines = try source.mkString finally source.close()
var lineNumber=1
var i=0
var k=0
var j=0
while(i < lines.length()){
if(lines(i)=='\n')
{lineNumber+=1;k=0; j=0}
var charAt = new ArrayList[Integer]();
while( j<pattern.length && i < lines.length() && lines(i)==pattern(j)){
j+=1
i+=1
k+=1
}
if(j==pattern.length){charAt.add(k-pattern.length+1);j=0}
if(j==0) {i+=1;k+=1}
else{j=prefixTable(j-1)}
if(charAt.size()>0){returnValue.put(lineNumber, charAt)}
}
return returnValue;
}