I have been trying to fiddle with this tricky problem and have been searching for the optimal solution. Basically it is sort of finding phrases with same/similar combination of words ( and select only one with a higher value based on a second column value). So far I have used expand.grid() and agrep but had no success.
The other option I am thinking at last resort is to go through every single term and split word by space and try to match possible combinations against all terms, using few for loops. But the cost of computation will be too high since I have considerably bigger size data.
Below is the sample data:
sample <- data.frame(Terms = I(c( "clamp","rod","rod44","rod21","rod21","rod13","rod21","rod12",
"rod iron plate","metal plate","plate metal","plates",
"rods", "plate rod iron", "11mm rod", "25mm rod", "40mm plate","rod 11mm")),
Weights = I(c(10, 10, 10, 10, 10, 10, 10, 10,
50, 45, 60, 20, 30, 100, 30, 20, 40, 50))
)
DESIRED OUTPUT :
Terms Weights
rod 11mm 50
25mm rod 20
40mm plate 40
clamp 10
plate rod iron 100
plate metal 60 ..........