This is a search algorithm I had in mind for a while.
Basically the algorithm is in two steps.
In the first step all the words from y.txt are inserted in a tree. Every path in the tree from the root to a leaf is a word. The leaf is empty.
For example, the tree for the words dog and day is the following.
<root>--<d>-<a>-<y>-<>
\-<o>-<g>-<>
The second part of the algorithm is a search down the tree. When you reach an empty leaf then you have found a word.
The implementation in Groovy, if more comments are needed just ask
//create a tree to store the words in a compact and fast to search way
//each path of the tree from root to an empty leaf is a word
def tree = [:]
new File('y.txt').eachLine{ word->
def t=tree
word.each{ c ->
if(!t[c]){
t[c]=[:]
}
t=t[c]
}
t[0]=0//word terminator (the leaf)
}
println tree//for debug purpose
//search for the words in x.txt
new File('x.txt').eachLine{ str, line->
for(int i=0; i<str.length(); i++){
if(tree[str[i]]){
def t=tree[str[i]]
def res=str[i]
def found=false
for(int j=i+1; j<str.length(); j++){
if(t[str[j]]==null){
if(found){
println "Found $res at line $line, col $i"
res=str[j]
found=false
}
break
}else if(t[str[j]][0]==0){
found=true
res+=str[j]
t=t[str[j]]
continue
}else{
t=t[str[j]]
res+=str[j]
}
found=false
}
if(found) println "Found $res at line $line, col $i"//I know, an ugly repetition, it's for words at the end of a line. I will fix this later
}
}
}
this is my y.txt
dog
day
apple
daydream
and x.txt
This is a beautiful day and I'm walking with my dog while eating an apple.
Today it's sunny.
It's a daydream
The output is the following:
$ groovy search.groovy
[d:[o:[g:[0:0]], a:[y:[0:0, d:[r:[e:[a:[m:[0:0]]]]]]]], a:[p:[p:[l:[e:[0:0]]]]]]
Found day at line 1, col 20
Found dog at line 1, col 48
Found apple at line 1, col 68
Found day at line 2, col 2
Found daydream at line 3, col 7
This algorithm should be fast because the depth of the tree doesn't depend on the number of words in y.txt. The depth is equal to the length of the longest word in y.txt.