1

I have a problem in producing a pdf document by using RStudio and Sweave. After running the code, there is no error message and no warning message. Nevertheless, when I type in in the console warnings() I get a list of things of which here is an excerpt, the rest of the warnings look exactly the same:

 `Warning messages:
  1: In normality_test(df, i, j) : NAs introduced by coercion
  2: In normality_test(df, i, j) : NAs introduced by coercion
  3: In normality_test(df, i, j) : NAs introduced by coercion
  4: In normality_test(df, i, j) : NAs introduced by coercion
  5: In if (shapiro.test(df[, i]) > 0.05 & shapiro.test(df[,  ... :
  the condition has length > 1 and only the first element will be used
  6: In normality_test(df, i, j) : NAs introduced by coercion`

The missing values (NAs) were accordingly discarded in the code before I had to get aware of the warnings. In order to solve the issue I used the command df[is.na(df)] <- 0. It did not change anything. The same warnings persist. To the contrary, I observe that figures are generated just like one would expect. All the code about which there is some warning shown above,works perfectly when run in RStudio but not linked via sweave. It seems contradictory and odd. I desperately tried for hours without success. Do you have any idea as to how to solve the issue ?

I am using the penguins data set. Here is the code used:

df <- read.csv("penguins.csv")
str(df)
#We transform the character variables type into factor ones
i <- sapply(df, is.character)
df[,i] <- lapply(df[,i], as.factor)
df[,8] <- as.factor(df[,8])
str(df)

normality_test <- function(df,i,j) {
df <- df[!is.na(df[,i])&!is.na(df[,j]),]
plot(c(0, 1), c(0, 1), ann = F, bty = 'n', type = 'n', xaxt = 'n', yaxt  = 'n')
if (shapiro.test(df[,i]) > 0.05 & shapiro.test(df[,j]) > 0.05){
res1 <- cor.test(df[,i],df[,j], 
                 method = "pearson")
text(.5, .5, paste("p.value:", round(res1$p.value,2), "\n r:",   round(res1$estimate,2)))
}
else {
res2 <- cor.test(df[,i],df[,j], 
                 method = "spearman")
text(.5,.5, paste("p.value:", round(res2$p.value,2), "rho:",    round(res2$estimate,2)))
  }
}
#We define the density function to include diagonal elements
hist_density <- function (df, i) {
tmp <- na.omit(df[,i])
hist(tmp, col = "light blue",
   probability = TRUE, main=NULL)
lines(density(tmp), col = "red", lwd = 1.5)
}

new_pairs <- function(df, x){
par(mar=c(1,1,1,1))
n_col<-sum(sapply(df, is.numeric))
par(mfrow=c(n_col,n_col))
n<-ncol(df)
for (i in 1:n){

 for (j in 1:n){
  
  if ((class(df[,i])!="factor" ) & (class(df[,j])!="factor") & i<j) {
    plot(df[,i], df[,j], col = df[,x])
   } 
   else if ((class(df[,i])!="factor") & (class(df[,j])!="factor") &  i==j)  {
    hist_density(df, i)
   } 
   else if ((class(df[,i])!="factor" ) & (class(df[,j])!="factor") &   i>j){
    normality_test(df,i,j)
   }
   else {NA}
     }
    }
   }


  new_pairs(df, 2)
user249018
  • 505
  • 2
  • 5
  • 18
  • 1
    I think the warning is because you are using `if/else` on a whole column `df[, i]` as `if/else` is not vectorized. i.e. it expects a single TRUE/FALSE as input. May be you need `ifelse` – akrun Mar 14 '21 at 18:41
  • 1
    Can you show the full code. When you say `shapiro.test(df[, i]) > 0.05` are you testing on p values. then you need to extract the pvalue i.e. `shapiro.test(df[, i])$p.value > 0.05` as the output is a `list` – akrun Mar 14 '21 at 18:44
  • Sure. Kindly enough, you helped me out yesterday in generating one part of that code. – user249018 Mar 14 '21 at 18:48
  • 1
    Yes, in that code, you were comparing the `class` which returns a single value. Here, the `shapiro.test(df[,i])` returns a list. So you need to extract the concerned value if it is pvalue as in my previous comment to create the logic. – akrun Mar 14 '21 at 18:49
  • 1
    I am going to do it right now. Great talking to you again. I am going to tell you about the results in a minute. – user249018 Mar 14 '21 at 20:12
  • The code works fine when written alone inside a single code chunk in a .Rnw file. I get the pdf file. But after embedding it into the bigger file, where I have also text and graphics, I run into same difficulties. The warnings are the same. Maybe the problem is elsewhere. Do you have any suggestion ? – user249018 Mar 14 '21 at 20:58
  • 1
    If it works in a chunk, the error must be related to some other code. Can you knit it as a separate test file to confirm – akrun Mar 14 '21 at 21:00
  • Do you mean to run the rest of the file without the `new_pairs()` function ? – user249018 Mar 14 '21 at 21:03
  • 1
    I meant to create a test .rnw file with only the functions posted here and run. if it runs without any issue, then the error must be from another code – akrun Mar 14 '21 at 21:05
  • When I run only the functions posted here in a single chunk I get, differently as stated previously, the error message " (chunk 1) can not open the connection ", when I run the same code in separate chunks, I get the .pdf. When I run the rest of the file without the code, I get many errors related to citation, even though I provided the packages needed. So, I am perplexed in not knowing what to do. – user249018 Mar 14 '21 at 21:41
  • 1
    Is it related to [this](https://stackoverflow.com/questions/26994958/error-cannot-open-the-connection-in-executing-knit-html-in-rstudio) – akrun Mar 14 '21 at 21:43
  • Maybe. I need to check it out. But concerning only the issue with the text, after discarding citations, the only error message that remains is a weird one, "File ended while scanning use of `\@xdblarg`". – user249018 Mar 14 '21 at 21:48
  • That is not clear to me. One way to debug is to run from the previous version of code that runs fine and then add the new functions one by one in the new_pairs and see where it is breaking – akrun Mar 14 '21 at 21:52
  • I went through the text following your suggestion. I found all the things that mattered, except for one, I can not solve the issue of citations, the package `\usepackage{biblatex}` is not accepted, I get the error message File biblatex.sty not found, which is weird since I have installed Latex in it's full version. `natbib` was also not supported. I googled but I can not find a solution. Do you have any idea ? Otherwise I thank you again. Your support was very precious. – user249018 Mar 14 '21 at 23:05
  • @akrun When I run the code you proposed, I get for the correlations everywhere the spearson coeffiecient, whereas if I run it without the change that you introduced, one gets the pearson correlations. The change you made concerns the fact to extract the p-value from the shapiro test before comparing it to the threshold, 0.05. Do you know why this happens ? Thanks. – user249018 Mar 15 '21 at 16:04
  • The change was in the `if` statement based on the p.value if both columns have p value greater than 0.05, do the pearson method or else spearman. – akrun Mar 15 '21 at 17:34
  • Regarding the other error, I am not sure as it needs a reproducible example for testing – akrun Mar 15 '21 at 17:35

1 Answers1

1

Based on the code, if we are checking on the p.value from shapiro.test, then extract that component with $ or [[ as the output of shapiro.test is a list

normality_test <- function(df,i,j) {
    df <- df[!is.na(df[,i])&!is.na(df[,j]),]
    plot(c(0, 1), c(0, 1), ann = F, bty = 'n', type = 'n', 
           xaxt = 'n', yaxt  = 'n')
    if (shapiro.test(df[,i])$p.value > 0.05 & 
        shapiro.test(df[,j])$p.value > 0.05){
                res1 <- cor.test(df[,i],df[,j], 
                 method = "pearson")
           text(.5, .5, paste("p.value:", round(res1$p.value,2), 
                   "\n r:",   round(res1$estimate,2)))
            } else {
            res2 <- cor.test(df[,i],df[,j], 
                     method = "spearman", exact = FALSE)
            text(.5,.5, paste("p.value:", round(res2$p.value,2), 
                  "rho:",    round(res2$estimate,2)))
            }
    }

# // We define the density function to include diagonal elements
hist_density <- function (df, i) {
    tmp <- na.omit(df[,i])
    hist(tmp, col = "light blue",
    probability = TRUE, main=NULL)
    lines(density(tmp), col = "red", lwd = 1.5)
    }

-create the new_pairs function that uses the above functions

new_pairs <- function(df, x){
    par(mar=c(1,1,1,1))
    n_col<-sum(sapply(df, is.numeric))
    par(mfrow=c(n_col,n_col))
    n <- ncol(df)
    for (i in 1:n){

        for (j in 1:n){
  
      if ((class(df[,i])!="factor" ) & (class(df[,j])!="factor") & i<j) {
        plot(df[,i], df[,j], col = df[,x])
        } 
        else if ((class(df[,i])!="factor") & 
           (class(df[,j])!="factor") &  i==j)  {
            hist_density(df, i)
            } 
         else if ((class(df[,i])!="factor" ) & 
                  (class(df[,j])!="factor") &   i>j){
         normality_test(df,i,j)
       }
    else {NA}
     }
    }
   }

-testing

new_pairs(iris, 2)
akrun
  • 874,273
  • 37
  • 540
  • 662