dplyr filter using qdap::which_misspelt OR dplyr filter with a nested function

Question

A small data frame:

words <- data.frame(terms = c("qhick brown fox",
          "tom dick harry", 
          "cats dgs"))

If I use qdap::which_misspelled I can find out missspelled words:

> which_misspelled(words)
      1       8 
"qhick"   "dgs"

But what I want to do is to subset words df on the rows that contain misspelling. The above returns index 1 and 8 referring to all words provided in my df, regardless of which row.

How can I subset my df based on any rows that contain misspelled words?

(Bonus if can be done with dplyr filter)

score 4 · Accepted Answer · answered Jun 30 '17 at 00:56

4

How about just use check_spelling which is vectorized, and the result contains a column of row numbers you can use to subset the data frame:

library(qdap)
words[check_spelling(words$terms)$row,,drop=F]

#            terms
#1 qhick brown fox
#3        cats dgs

The which_misspelled function seems meant to check for a single string instead of a data frame:

which_misspelled - Check the spelling for a string.

answered Jun 30 '17 at 00:56

Psidom

209,562
33
339
356

This is great thank you. What does drop = F do? I mean, I see that if I leave it out it returns `[1] qhick brown fox cats dgs Levels: cats dgs qhick brown fox tom dick harry` but whats happening here? – Doug Fir Jun 30 '17 at 01:02
1

The `drop = F` is used to keep the result as a data frame as you've noticed. There is a behavior that when you subset a data frame and the result contains only one column, it defaults to a vector for convenience purpose. So here if you want to keep it as a data frame, use `drop=FALSE`. – Psidom Jun 30 '17 at 01:05

dplyr filter using qdap::which_misspelt OR dplyr filter with a nested function

1 Answers1