-2

I am stuck on how to proceed with coding in RStudio for the Bonferroni Correction and the raw P values for the Pearson Correlation Matrix. I am a student and am new to R. I am also lost on how to get a table of the mean,SD, and n for the data. When I calculated the Pearson Correlation Matrix I just got the r value and not the raw probabilities value also. I am not sure how to code to get that in RStudio. I then tried to calculate the Bonferroni Correction and received an error message saying list object cannot be coerced to type double. How do I fix my code so this goes away? I also tried to create a table of the mean, SD, and n for the data and I became stuck on how to proceed.

My data is as follows:

Tree Height DBA Leaf Diameter
45.3    14.9    0.76
75.2    26.6    1.06
70.1    22.9    1.19
95      31.8    1.59
107.8   35.5    0.43
93      26.2    1.49
91.5    29      1.19
78.5    29.2    1.1
85.2    30.3    1.24
50      16.8    0.67
47.1    12.8    0.98
73.2    28.4    1.2

Packages I have installed dplyr,tidyr,multcomp,multcompview I Read in the data from excel CSV(comma delimited) file and This creates data>dataHW8_1 12obs. of 3 variables

summary(dataHW8_1)

I then created Scatterplots of the data plot(dataHW8_1$Tree_Height,dataHW8_1$DBA,main="Scatterplot Tree Height Vs Trunk Diameter at Breast Height (DBA)",xlab="Tree Height (cm)",ylab="DBA (cm)") plot(dataHW8_1$Tree_Height,dataHW8_1$Leaf_Diameter,main="Scatterplot Tree Height Vs Leaf Diameter",xlab="Tree Height (cm)",ylab="Leaf Diameter (cm)") plot(dataHW8_1$DBA,dataHW8_1$Leaf_Diameter,main="Scatterplot Trunk Diameter at Breast Height (DBA) Vs Leaf Diameter",xlab="DBA (cm)",ylab="Leaf Diameter (cm)")

I then noticed that the data was not linear so I transformed it using the log() fucntion dataHW8_1log = log(dataHW8_1)

I then re-created my Scatterplots using the transformed data

plot(dataHW8_1log$Tree_Height,dataHW8_1log$DBA,main="Scatterplot of 
Transformed (log)Tree Height Vs Trunk Diameter at Breast Height 
(DBA)",xlab="Tree Height (cm)",ylab="DBA (cm)")
plot(dataHW8_1log$Tree_Height,dataHW8_1log$Leaf_Diameter,main="Scatterplot 
of Transformed (log)Tree Height Vs Leaf Diameter",xlab="Tree Height 
(cm)",ylab="Leaf Diameter (cm)")
plot(dataHW8_1log$DBA,dataHW8_1log$Leaf_Diameter,main="Scatterplot of 
Transformed (log) Trunk Diameter at Breast Height (DBA) Vs Leaf 
Diameter",xlab="DBA (cm)",ylab="Leaf Diameter (cm)")

I then created a matrix plot of Scatterplots

pairs(dataHW8_1log)

I then calculated the correlation coefficent using the Pearson method this does not give an uncorreted matrix of P values------How do you do that?

cor(dataHW8_1log,method="pearson")

I am stuck on what to do to get a matrix of the raw probabilities (uncorrected P values) of the data

I then calculated the Bonferroni correction-----How do you do that?

Data$Bonferroni = 
p.adjust(dataHW8_1log, 
method = "bonferroni")

Doing this gave me the follwing error:

 Error in p.adjust(dataHW8_1log, method = "bonferroni") : 
 (list) object cannot be coerced to type 'double'

I tried to fix using lapply, but that did not fix my promblem

I then tried to make a table of mean, SD, n, but I was only able to create the following code and became stuck on where to go from there------How do you do that?

(,data = dataHW8_1log,
      FUN = function(x) c(Mean = mean(x, na.rm = T),
                          n = length(x),
                          sd = sd(x, na.rm = T))

I have tried following examples online, but none of them have helped me with the getting the Bonferroni Correction to code correctly.If anyone can help explain what I did wrong and how to make the Matrices/table I would greatly appreciate it.

N.J
  • 21
  • 2
  • 7

1 Answers1

0

Here is an example using a 50 rows by 10 columns sample dataframe.

# 50 rows x 10 columns sample dataframe
df <- as.data.frame(matrix(runif(500), ncol = 10));

We can show pairwise scatterplots.

# Pairwise scatterplot
pairs(df);

enter image description here

We can now use cor.test to get p-values for a single comparison. We use a convenience function cor.test.p to do this for all pairwise comparisons. To give credit where credit is due, the function cor.test.p has been taken from this SO post, and takes as an argument a dataframe whilst returning a matrix of uncorrected p-values.

# cor.test on dataframes
# From: https://stackoverflow.com/questions/13112238/a-matrix-version-of-cor-test
cor.test.p <- function(x) {
    FUN <- function(x, y) cor.test(x, y)[["p.value"]];
    z <- outer(
      colnames(x),
      colnames(x),
      Vectorize(function(i,j) FUN(x[,i], x[,j])));
    dimnames(z) <- list(colnames(x), colnames(x));
    return(z);
}

# Uncorrected p-values from pairwise correlation tests
pval <- cor.test.p(df);

We now correct for multiple hypothesis testing by applying the Bonferroni correction to every row (or column, since the matrix is symmetric) and we're done. Note that p.adjust takes a vector of p-values as an argument.

# Multiple hypothesis-testing corrected p-values
# Note: pval is a symmetric matrix, so it doesn't matter if we correct 
# by column or by row
padj <- apply(pval, 2, p.adjust, method = "bonferroni");
padj;
#V1 V2        V3 V4        V5        V6 V7 V8        V9 V10
#V1   0  1 1.0000000  1 1.0000000 1.0000000  1  1 1.0000000   1
#V2   1  0 1.0000000  1 1.0000000 1.0000000  1  1 1.0000000   1
#V3   1  1 0.0000000  1 0.9569498 1.0000000  1  1 1.0000000   1
#V4   1  1 1.0000000  0 1.0000000 1.0000000  1  1 1.0000000   1
#V5   1  1 0.9569498  1 0.0000000 1.0000000  1  1 1.0000000   1
#V6   1  1 1.0000000  1 1.0000000 0.0000000  1  1 0.5461443   1
#V7   1  1 1.0000000  1 1.0000000 1.0000000  0  1 1.0000000   1
#V8   1  1 1.0000000  1 1.0000000 1.0000000  1  0 1.0000000   1
#V9   1  1 1.0000000  1 1.0000000 0.5461443  1  1 0.0000000   1
#V10  1  1 1.0000000  1 1.0000000 1.0000000  1  1 1.0000000   0
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • When I go to do that after downloading the 'qdapTools' package. An running that code actually as you have it did not work. When I did 'corr.test (dataHW8_1log, method="pearson", adjust="bonferroni")' I noticed it does not show an astreik of if something is significant and do to that it gives inaccurate results. When I checked 'cor.test(dataHW8_1$Tree_Height,dataHW8_1$DBA,method="pearson")' I got a result of 'p-value = 5.853e-06' which is significant. However, if I look at the table that 'corr.test' gives me the probability value is '0.00'. I still at a loss on how to get a table of mean,sd. – N.J Nov 26 '17 at 22:38
  • @N.J `cor.test` gives you p-values! The (dreadful) asterisks notation is something you need to apply yourself, if necessary. Statistical significance is expressed in terms of p-values (conditional on the null being true which in your case corresponds to a Pearson's product-moment correlation coefficient of r = 0). My example definitely works and is consistent. I don't know `qdapTools`, so I can't help you with that. You can get the mean and sd of your source data by just using e.g. `sapply(your_dataframe, mean)` and `sapply(your_dataframe, sd)`. – Maurits Evers Nov 26 '17 at 22:45