1

This question is related to a previous topic: How to use custom function to create new binary variables within existing dataframe?

I would like to use a similar function but be able to use a vector to specify ICD9 diagnosis variables within the dataframe to search for (e.g., "diag_1", "diag_2","diag_1", etc )

I tried

y<-c("diag_1","diag_2","diag_1") 

diagnosis_func(patient_db, y, "2851", "Anemia")

but I get the following error:

Error in `[[<-`(`*tmp*`, i, value = value) : 
  recursive indexing failed at level 2 

Below is the working function by Benjamin from the referenced post. However, it works only from 1 diagnosis variable at a time. Ultimately I need to create a new binary variable that indicates if a patient has a specific diagnosis by querying the 25 diagnosis variables of the dataframe.

*targetcolumn is the icd9 diagnosis variables "diag_1"..."diag_20" is the one I would like to input as vector

diagnosis_func <- function(data, target_col, icd, new_col){
  pattern <- sprintf("^(%s)", 
                 paste0(icd, collapse = "|"))

  data[[new_col]] <- grepl(pattern = pattern, 
                       x = data[[target_col]]) + 0L
  data
}

diagnosis_func(patient_db, "diag_1", "2851", "Anemia")

This non-function version works for multiple diagnosis. However I have not figured out how to use it in a function version as above.

 pattern = paste("^(", paste0("2851", collapse = "|"), ")", sep = "")

df$anemia<-ifelse(rowSums(sapply(df[c("diag_1","diag_2","diag_3")], grepl, pattern = pattern)) != 0,"1","0")

Any help or guidance on how to get this function to work would be greatly appreciated.

Best, Albit

Community
  • 1
  • 1
albit paoli
  • 161
  • 2
  • 11
  • Probably better to feed the vector to `lapply`. Something like `lapply(y, function(i) diagnosis_func(data=df, target_col=i, icd=icd, newcol=i))`. Maybe you'd have to tweek you function a bit, but this would be the better route, I suspect. – lmo Mar 14 '17 at 14:22
  • Thanks lmo! will try this – albit paoli Mar 14 '17 at 14:37
  • Albit, the problem is that the `grepl` in Benjamin's function will work on one column of your data frame. Let's say you have a multiple columns, `target_col <- c("diag_1", "diag_2", "diag_3")`. In order to apply `grepl` you can try something like this : `apply(data[target_col], 2, function(x) grepl(pattern=pattern, x))`. Let me know if this works. – Dhiraj Mar 15 '17 at 01:35

1 Answers1

1

Try this modified version of Benjamin's function:

diagnosis_func <- function(data, target_col, icd, new_col){
  pattern <- sprintf("^(%s)", 
                     paste0(icd, collapse = "|"))

  new <- apply(data[target_col], 2, function(x) grepl(pattern=pattern, x)) + 0L
  data[[new_col]] <- ifelse(rowSums(new)>0, 1,0)
  data
}
Dhiraj
  • 1,650
  • 1
  • 18
  • 44