I'm trying to implement prediction by analyzing sentences. Consider the following [rather boring] sentences
Call ABC
Call ABC again
Call DEF
I'd like to have a data structure for the above sentences as follows:
Call: (ABC, 2), (again, 1), (DEF, 1)
ABC: (Call, 2), (again, 1)
again: (Call, 1), (ABC, 1)
DEF: (Call, 1)
In general, Word: (Word_it_appears_with, Frequency), ....
Please note the inherent redundancy in this type of data. Obviously, if the frequency of ABC
is 2 under Call
, the frequency of Call
is 2 under ABC
. How do I optimize this?
The idea is to use this data when a new sentence is being typed. For example, if Call
has been typed, from the data, it's easy to say ABC
is more likely to be present in the sentence, and offer it as the first suggestion, followed by again and DEF
.
I realise this is one of a million possible ways of implementing prediction, and I eagerly look forward to suggestions of other ways to do it.
Thanks