I'm just going to interpret from what you're saying, and hope it will help.
So it seems ICD-10 codes are combinations of characters and integers for the classification of diseases in the form AA00-AA00. Understand your dataframe has some NAs in that column, as well as some other inputs with just one letter that you want to filter out.
I will assume your dataframe has a structure like this more or less:
data <- data.frame(Patient=character(),
Age=integer(),
V=character(),
stringsAsFactors=FALSE)
for(u in 1:5){
data[u,] <- c(LETTERS[sample(1:26,1)],
sample(0:100,1),
paste0(LETTERS[sample(1:26,1)],
LETTERS[sample(1:26,1)],
sample(0:9,1),sample(0:9,1),"-",
LETTERS[sample(1:26,1)],
LETTERS[sample(1:26,1)],
sample(0:9,1),sample(0:9,1)))
}
data[3,3] <- NA
data[5,3] <- "h1 456"
head(data)
Patient Age V
1 N 23 EZ11-RO87
2 E 60 QE57-CJ49
3 H 73 <NA>
4 G 10 AQ75-UX16
5 Z 28 h1 456
Making a small change to your code (note the commas for the name of the column / plus I'm assuming v and var are actually the same column), you'd have:
indication <- (nchar(gsub("[^[:alpha:]]", "", data[["V"]]))>1 & !is.na(data[["V"]]))
indication
[1] TRUE TRUE FALSE TRUE FALSE
Code would return a vector of booleans with length equal to the number of rows in the dataframe and values TRUE if within the cell there's no NA and character has more than 1 letter
Not sure this answers your question on how to interpret/make the code work, but anyhow I hope it helps.