1

Please i am trying to do bootstrapping cross-validation for PLS-DA classification. i have to repeat this procedure for six (6) different scaling methods each for different datasets. The problem is each is taking over 2 hours to complete. Please, kindly help if there is a way to improve the speed. Below is the written code. Thanks.

Note: X is a 33 x 160 data matrix containing healthy samples as well as diseased samples to be classified as "1" and "2" respectively.

    CLASS <-c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)

cat("\n select the scaling....")
cat("0 = Raw; 1 = Mean-centre; 2 = Auto-scale; 3 = Range-scale (-1 to 1); 4 = Range-       scale; 5 = Normalise; 6 = pareto .\n\n")
S <-as.numeric(readline("Enter the scaling method, e.g.,0 :"))  

ACC <- numeric(150)
SPEC <- numeric(150)
SENS <- numeric(150)
NPV <- numeric(150)
PPV <- numeric(150)
FDR <- numeric(150)
LVn <- numeric(150)
for (i in 1:150) {

  # Split data
  ## 70% of the sample size
  smp_size <- floor(0.70 * nrow(X))

  train_ind <- sample(seq_len(nrow(X)), size = smp_size)
  print(train_ind)


  TR <- X[train_ind, ] # Training Dataset
  TST <- X[-train_ind, ] # Testing Dataset
  CTR <- CLASS[train_ind] # Training classes


  # Do Leave One Out Cross-Validation to determine best LV for PLSDA Classification

  OptLV <- DoLOOCVa2(TR,CTR,S,1,seq(1,20,1))


  CTST <- CLASS[-train_ind] # Testing classes
  C1 <- pretreat(TR,TST,S) 
  C2 <- C1$trDATAscaled
  C3 <- C1$tstDATAscaled  

  # Determine the predicted classes with the optimal LV

  C4 <- pls.lda(C2,CTR,C3,OptLV$OptLVs) # Perform the classification

  C5 <- as.numeric(C4$predclass) # Extract the predicted class
  C6 <- C5 - CTST

  TN <- 0
  TP <- 0
  FN <- 0
  FP <- 0
  for (j in 1:nrow(C3)) {
    enableJIT(3)
  if (C6[j]==0 & CTST[j]==1){
    TN=TN+1
  }
  if (C6[j]==0 & CTST[j]==2){
    TP=TP+1
  }
  if(C6[j]!=0 & CTST[j]==1){
    FP=FP+1
  } 
  if(C6[j]!=0 & CTST[j]==2){
    FN=FN+1
  }

} 
ACC[i] <- 100 * (TN + TP)/(TN+TP+FP+FN)
SPEC[i] <- 100 * TN/(TN + FP) # Percentge Speificity
SENS[i] <- 100 * TP/(TP + FN) # PErcenatage Sensitivity
NPV[i] <- 100 * TN/(TN + FN) # percentage Negative Predictive Value
PPV[i] <- 100 * TP/(TP + FP) # Percentage Positive Predictive Value
FDR[i] <- 100 * FP/(TP + FP) # Percentage False Discovery Rate
LVn[i] <- OptLV$OptLVs

}
Siguza
  • 21,155
  • 6
  • 52
  • 89
  • Please provide an example that can be run. For starters, X isn't defined. Your `j` loop is unnecessary. It's just adding truth values. For example, `TN <- sum(C6==0 & CTST==1)` (also eliminates the variable initialization). – John Jul 30 '14 at 13:10
  • Hi @John, thanks for your reply. X is a data matrix obtained from SIFT-MS analysis, for example, 33 samples of 160 variables. Some of the samples are healthy while others are diseased. The idea is to classify the healthy samples as "1" and diseased samples as "2". Hope that defines it appropriately – user3034054 Jul 30 '14 at 14:02

0 Answers0