find frequency of every word

Question

There is a question asked to me in the interview, but I am not able to answer that.

Question is :

You are given a directed graph in which every node is a character and you are also given a array of strings. The task is to calculate the frequency of every string in the array by searching in the graph.

My approach : I used trie, Suffix tree, but the interviewer is not fully satisfied. Can you give me an algorithm for the given problem.

An arbitrary directed graph? So each node contain some random character, and all edges are spread out randomly? Your question doesn't make sense. — aioobe, Oct 19 '12 at 12:51
It seems like your answer needs to include a breadth first and/or a depth first search — Alan, Oct 19 '12 at 13:59
@aioobe it makes sense to me, he has a graph of characters, he's trying to search the graph to find a string sequence — Alan, Oct 19 '12 at 14:14
It means to find the occurence of strings in the graph. If it is present, show the number otherwise Zero. — devsda, Oct 19 '12 at 14:29
How are the strings in the array related to the graph? There seems to be no connection whatsoever. Why should we assume we can determine the frequency of the strings by performing an operation on a graph that has nothing to do with the strings? — jogojapan, Oct 19 '12 at 16:41
@jogojapan This is like saying "How is the text in a search box related to the text in a document?, How can we determine the number of hits if the document has nothing to do with the search text". The array is simply what is being search for, the graph is what is being searched — Alan, Oct 19 '12 at 16:48
This question is unclear. The questioner implies that you can somehow determine the frequency of a string in an array by searching a graph but does not say how the graph relates to the string. For example, if the array is { dog, cat, bird, dog, fish, cat, apple, dog, cat } then what is the graph? — Tyler Durden, Oct 19 '12 at 19:34
Note that your graph is basically a [DFA](http://en.wikipedia.org/wiki/Deterministic_finite_automaton) (AKA final state machine) — amit, Oct 19 '12 at 22:17

Alan · Answer 1 · 2012-10-19T14:12:06.723

How about the following... To find the number of occurrences of a String, s, in a directed graph.

Start with a bread first search (marking already visited nodes to avoid cycles)
When the first character is found, switch to a depth first search with max-depth = length(s)
If the string sequence is detected, increment occurrence count for each occurence of the DFS
Resume the BFS

Some caveats

I do not believe the DFS should share the BFS's visited node list (you may need to go back to the beginning and overlap for example
The BFS should also not shared the DFS visited list. For example, you could be looking for "Alan" and have "AAlan" and make sure you re-start on the second A

Now for an array, I can just repeat this procedure for each string.. Sure there may be more efficient solution, but I'd start off thinking about it this way..

Did your answer include any conversation about a breadth-first or depth-first search? If someone mentioned searching a graph, I'd almost always reply with a variation of one of these

score 0 · Answer 2 · answered Oct 19 '12 at 15:05

Here's another solution:

First we need to do some preprocessing on the string array. Let's define C as the subset of all the characters composing all the strings in the array. For each character in C, we are going to keep track of each string containing that character and its position in that string + a Boolean value stating if its the last char in that string. This can be done using a dictionary.

For example, let's say our array is ['one', 'two', 'three']. Our dictionary would look something like this:

'o': (0, 0, false),(1,2,true)
't': (1, 0, false),(2,0,false) 
'n': (0, 1, false)
'e': (2, 3, false),(2,4, true)
'h': (2, 1, false)
'r': (2, 2, false)
'w': (2, 1, false)

Next we are going to use DFS and Dynamic Programming. Basically, whenever you visit an edge, you check the parent and the child on the dict to see if they compose a substring and you store that information.

Using this method, you can easily detect all recurrence of every string in the array.

Building the preprocessing table can be done in o(L) where L is the sum of the lengths of all the strings in the array.

Discovering all recurrence can be done in O(m * k) where m is the number of edges (and not the number of nodes, as a node can be discovered multiple times) and k is the number of strings.

The implementation can be a little tricky and there are some pitfalls you should avoid.

score 0 · Answer 3 · answered Oct 20 '12 at 16:29

see this graph, each level has all 4*4 edges(hard to draw, plz stand me)

enter image description here

there may be a lot of occurrences.

i think he may be expecting dynamic programming:

process each string individually, f[i][j] denotes the total numbers to accomplish the string's last j letters starting from node i, the rest would be easy.

find frequency of every word

3 Answers3