0

I have three dataframes created from different ngram counts (Uni, Bi , Tri) each data frame contains the separated ngram, frequency counts (n) and have added probability using smoothing.

I have written three functions to look through the tables and return the highest probable word based on an input string. And have binded them

##Prediction Model
trigramwords <- function(FirstWord, SecondWord, n = 5 , allow.cartesian =TRUE) {
probword <- trigramtable[.(FirstWord, SecondWord), allow.cartesian = TRUE][order(-Prob)]
if(any(is.na(probword)))
return(bigramwords(SecondWord, n))
if(nrow(probword) > n)
return(probword[1:n, ThirdWord])
count <-nrow(probword)
bgramwords <- bigramtable(SecondWord, n)[1:(n - count)]
return(c(probword[, ThirdWord], bgramwords))
}


bigramwords <- function(FirstWord, n = 5 , allow.cartesian = TRUE){
probword <- bigramtable[FirstWord][order(-Prob)]
if(any(is.na(probword)))
return(Unigramword(n))
if (nrow(probword) > n)
return(probword[1:n, SecondWord])
count <- nrow(probword)
word1 <- Unigramword(n)[1:(n - count)]
return(c(probword[, SecondWord], word1))
}

##Back off Model
Unigramword <- function(n = 5, allow.cartesian = TRUE){
return(sample(UnigramTable[, FirstWord], size = n))
}

## Bind Functions
predictword <- function(str) {
require(quanteda)
tokens <- tokens(x = char_tolower(str))
tokens <- char_wordstem(rev(rev(tokens[[1]])[1:2]), language = "english")

 words <- trigramwords(tokens[1], tokens[2], 5)
 chain_1 <- paste(tokens[1], tokens[2], words[1], sep = " ")

 print(words[1])
}

However I receive the following warning message and the output is always the same word. If I use only the bigramwords function it works fine, but when adding the trigram function I get the warning message. I believe it because 1:n is not defined correctly.

Warning message:
In 1:n : numerical expression has 5718534 elements: only the first used
Nevon D.
  • 1
  • 3
  • Possible duplicate of [R Error: "In numerical expression has 19 elements: only the first used"](https://stackoverflow.com/questions/23173819/r-error-in-numerical-expression-has-19-elements-only-the-first-used) – Bulat Feb 24 '19 at 20:51
  • Here is the minimal reproducible example of the problem: ```R n <- 1:2 1:n ``` – Bulat Feb 24 '19 at 23:14

0 Answers0