I am looking for new ideas for two features I am implementing.
1.) Text segmentation feature:
Ex:
User Query: Resolved Query:
----------- ---------------
It has lotsofwordstogether It has lots of words together
I am using normal recursion or DP solution using unigrams probability.
2.) Kind of collocation:
Ex:
User Query: Resolved Query:
---------- ---------------
I like t shirts in Wal mart I like t-shirts in Walmart
No clue how do to this. Only Idea I have currently is tokenise the sentence and combine non meaningful tokens with previous tokens or next tokens to form words which can be checked against the unigrams.
These solutions are slow for my requirements(especially the first one). I want to use these features together. Looking for better ideas.