5

Im writing a function to incorporate into shiny app that predicts the next word from a set of pre defined files. When I create the functions to predict the next word using ngrams,

I'm running into this error


x object of type 'closure' is not subsettable
i Input ..1 is top_n_rank(1, n).

Run rlang::last_error() to see where the error occurred.

In addition: Warning message:
In is.na(x) : is.na() applied to non-(list or vector) of type 'closure'

This is my R program. I have already created bi-gram tri-gram and quad-gram words in another R script and saved it as rds files which I have used here

library(tidyverse)
library(stringr)
library(dplyr)
library(ngram)
library(tidyr)

bi_words <- readRDS("./bi_words.rds")
tri_words <- readRDS("./tri_words.rds")
quad_words <- readRDS("./quad_words.rds")

bigram <- function(input_words){
        num <- length(input_words)
        dplyr::filter(bi_words, 
               word1==input_words[num]) %>% 
                top_n(1, n) %>%
                filter(row_number() == 1L) %>%
                select(num_range("word", 2)) %>%
                as.character() -> out
        ifelse(out =="character(0)", "?", return(out))
}

trigram <- function(input_words){
        num <- length(input_words)
        dplyr::filter(tri_words, 
               word1==input_words[num-1], 
               word2==input_words[num])  %>% 
                top_n(1, n) %>%
                filter(row_number() == 1L) %>%
                select(num_range("word", 3)) %>%
                as.character() -> out
        ifelse(out=="character(0)", bigram(input_words), return(out))
}

quadgram <- function(input_words){
        num <- length(input_words)
        dplyr::filter(quad_words, 
               word1==input_words[num-2], 
               word2==input_words[num-1], 
               word3==input_words[num])  %>% 
                top_n(1, n) %>%
                filter(row_number() == 1L) %>%
                select(num_range("word", 4)) %>%
                as.character() -> out
        ifelse(out=="character(0)", trigram(input_words), return(out))
}

ngrams <- function(input){
        # Create a dataframe
        input <- data.frame(text = input)
        # Clean the Inpput
        replace_reg <- "[^[:alpha:][:space:]]*"
        input <- input %>%
                mutate(text = str_replace_all(text, replace_reg, ""))
        # Find word count, separate words, lower case
        input_count <- str_count(input, boundary("word"))
        input_words <- unlist(str_split(input, boundary("word")))
        input_words <- tolower(input_words)
        # Call the matching functions
        out <- ifelse(input_count == 1, bigram(input_words), 
                      ifelse (input_count == 2, trigram(input_words), quadgram(input_words)))
        # Output
        return(out)
}

input <- "In case of a"
ngrams(input)

This is an snippet of the quad_words.rds

Jayashree K
  • 131
  • 2
  • 3
  • 16
  • Does your functions work, but not when you incorporate it into shiny? If yes, please show your shiny code. The errors sound like you use a `reactive` in your functions but don't evaluate it with the brackets (e.g. `test <- reactive({...})` has to be then called with `test()` because it is a function) – starja Oct 09 '20 at 07:43
  • What does your data look like? Can you give us a sample of `quad_words`? I can reproduce your error by not giving the a test dataset an `n` column to test on in `top_n`, suggesting that is where the error lies – Andy Baxter Oct 09 '20 at 10:52
  • To clarify - try running `tibble(a = 1:10) %>% top_n(1, n)` and you'll see the exact same error. `n` here is being passed to the `wt` argument (positionally), which means it is looking for the n variable in the dataset to use for ordering. presumably your input data does have a column of count of commonness of each ngram which you are using to rank them? – Andy Baxter Oct 09 '20 at 10:59
  • I have not included the code into the shiny app. I got the error while testing the function @starja – Jayashree K Oct 12 '20 at 10:33
  • I have attached the sample of quad_words @AndrewBaxter – Jayashree K Oct 12 '20 at 10:36
  • ok I think I can see what your function is aiming to do then. Just to check though, if there are two quadgrams say with words 1-3 matching, how do you sort/select between the two? at the moment your lines `top_n(1, n) %>% filter(row_number() == 1L)` are doing 'select row with largest value in `n` column' then 'select top row'. If you don't have an `n` column (frequency in training text?) then first call isn't necessary. If you do then could select the top another way. – Andy Baxter Oct 12 '20 at 11:01

1 Answers1

1

Perhaps the missing step here is counting which ngram in each case is the most common before selecting the top one. A simple solution would be to substitute in add_count instead of top_n:

filter(quad_words, 
       word1==input_words[num-2], 
       word2==input_words[num-1], 
       word3==input_words[num])  %>%
  add_count(word4, sort = TRUE) %>% 
  filter(row_number() == 1L) %>%
  select(num_range("word", 4)) %>%
  as.character() -> out
ifelse(out=="character(0)", trigram(input_words), return(out))

... as the centre part of your quadgram call. The call to word4 counts the most frequent 4th word after filtering for words 1-3. The sort = TRUE argument makes the top-frequency quadgram appear in row 1, which your next line is then selecting for. Hope this is a helpful step - do follow up with any questions or corrections or mark as done if this solves this particular problem.

Andy Baxter
  • 5,833
  • 1
  • 8
  • 22