Hi I would like to understand the way R read this line of code

Question

Object data is a dataframe that contain multiple columns and v is the column that contains icd10 code.

The aim is to store a logical value that will return TRUE if satisfy those conditions.

indication <- nchar(gsub("[^[:alpha:]]", "", data[[v]])) > 1 & !is.na(data[[var]])

because when I print indication lets say the data[1984,96]. Row number 1984 and column 96(v), the value will be "inc".

Please post the output of `dput(head(data))` and the expected output. — Rui Barradas, Jan 24 '22 at 09:03

score 0 · Accepted Answer · answered Jan 24 '22 at 22:25

I'm just going to interpret from what you're saying, and hope it will help.

So it seems ICD-10 codes are combinations of characters and integers for the classification of diseases in the form AA00-AA00. Understand your dataframe has some NAs in that column, as well as some other inputs with just one letter that you want to filter out.

I will assume your dataframe has a structure like this more or less:

data <- data.frame(Patient=character(),
                   Age=integer(), 
                   V=character(), 
                   stringsAsFactors=FALSE)
for(u in 1:5){
data[u,] <- c(LETTERS[sample(1:26,1)],
              sample(0:100,1),
              paste0(LETTERS[sample(1:26,1)],
                     LETTERS[sample(1:26,1)],
                     sample(0:9,1),sample(0:9,1),"-",
                     LETTERS[sample(1:26,1)],
                     LETTERS[sample(1:26,1)],
                     sample(0:9,1),sample(0:9,1)))
}

data[3,3] <- NA
data[5,3] <- "h1 456"

head(data)
  Patient Age         V
1       N  23 EZ11-RO87
2       E  60 QE57-CJ49
3       H  73      <NA>
4       G  10 AQ75-UX16
5       Z  28    h1 456

Making a small change to your code (note the commas for the name of the column / plus I'm assuming v and var are actually the same column), you'd have:

indication <- (nchar(gsub("[^[:alpha:]]", "", data[["V"]]))>1 & !is.na(data[["V"]]))
indication

[1]  TRUE  TRUE FALSE  TRUE FALSE

Code would return a vector of booleans with length equal to the number of rows in the dataframe and values TRUE if within the cell there's no NA and character has more than 1 letter

Not sure this answers your question on how to interpret/make the code work, but anyhow I hope it helps.

Thank you Valentia, I am new to R and this answer help me a lot. Yes, var and v are similar I misspelled it. By the way, regular expression "[^[:alpha:]]" on gsub() function will replace non numerical with empty space right. If the value as in your example h1 456 will be left with h then it is not more than 1. That was why it returned FALSE. I really appreciate it. — Lukman Afandi, Jan 25 '22 at 01:48
Indeed, "[^[:alpha:]]" will replace non alphabetical characters (like numbers, or blank space) into whatever you put inside the commas as second term, which in this case (second term "" meaning nothing) as you say returns just "h" which under nchar() is not greater than 1. Cheers! — Valentia, Jan 25 '22 at 10:06

Hi I would like to understand the way R read this line of code

1 Answers1