1

A small data frame:

words <- data.frame(terms = c("qhick brown fox",
          "tom dick harry", 
          "cats dgs"))

If I use qdap::which_misspelled I can find out missspelled words:

> which_misspelled(words)
      1       8 
"qhick"   "dgs" 

But what I want to do is to subset words df on the rows that contain misspelling. The above returns index 1 and 8 referring to all words provided in my df, regardless of which row.

How can I subset my df based on any rows that contain misspelled words?

(Bonus if can be done with dplyr filter)

Doug Fir
  • 19,971
  • 47
  • 169
  • 299

1 Answers1

4

How about just use check_spelling which is vectorized, and the result contains a column of row numbers you can use to subset the data frame:

library(qdap)
words[check_spelling(words$terms)$row,,drop=F]

#            terms
#1 qhick brown fox
#3        cats dgs

The which_misspelled function seems meant to check for a single string instead of a data frame:

which_misspelled - Check the spelling for a string.

Psidom
  • 209,562
  • 33
  • 339
  • 356
  • This is great thank you. What does drop = F do? I mean, I see that if I leave it out it returns `[1] qhick brown fox cats dgs Levels: cats dgs qhick brown fox tom dick harry` but whats happening here? – Doug Fir Jun 30 '17 at 01:02
  • 1
    The `drop = F` is used to keep the result as a data frame as you've noticed. There is a behavior that when you subset a data frame and the result contains only one column, it defaults to a vector for convenience purpose. So here if you want to keep it as a data frame, use `drop=FALSE`. – Psidom Jun 30 '17 at 01:05