1

Excuse Essay So I’ve done a Deseq analysis, then taken the counts file, applied the same names and then removed an NA values , then created a ?tibble/table called sigs, which I then turn into a Data frame:

sigs <- na.omit(res)
sigs

Looks something like this:

log2 fold change (MLE): condition groupb vs groupa 
Wald test p-value: condition groupb vs groupa 

DataFrame with 16003 rows and 6 columns
                     baseMean log2FoldChange     lfcSE       stat     pvalue      padj
                    <numeric>      <numeric> <numeric>  <numeric>  <numeric> <numeric>
ENSSSCG00000048769   82.31674    -0.35837484 0.1217091 -2.9445195 0.00323457 0.0358965
ENSSSCG00000037372   40.49912     0.19133392 0.1472912  1.2990176 0.19393788 0.3612217
ENSSSCG00000027257 1572.05160     0.00319404 0.0743954  0.0429334 0.96575464 0.9791215
ENSSSCG00000029697  494.25472    -0.07424653 0.0665490 -1.1156672 0.26456461 0.4385568
ENSSSCG00000049216    2.54242    -0.42346331 0.5024718 -0.8427604 0.39936246 0.5728141

Then I turn it into a Data frame:

sigs.df <- as.data.frame(sigs)

Trying to show that here:

Description:df [16,003 × 6]
 
 
baseMean
<dbl>
log2FoldChange
<dbl>
lfcSE
<dbl>
stat
<dbl>
pvalue
<dbl>
ENSSSCG00000048769  8.231674e+01    -0.3583748397   0.12170911  -2.9445194769   3.234566e-03    
ENSSSCG00000037372  4.049912e+01    0.1913339198    0.14729124  1.2990176317    1.939379e-01    
ENSSSCG00000027257  1.572052e+03    0.0031940448    0.07439538  0.0429333738    9.657546e-01    
ENSSSCG00000029697  4.942547e+02    -0.0742465345   0.06654900  -1.1156672146   2.645646e-01    

Then I try and apply some parameters to thatt dataframe (Log2fold change and Padj)

sigs.df <- sigs.df[(abs(sigs.df$log2FoldChange)>1) & (sigs.df$padj < 0.05),]
sigs.df
Description:df [426 × 6]
baseMean
<dbl>
log2FoldChange
<dbl>
lfcSE
<dbl>
stat
<dbl>
pvalue
<dbl>
padj
<dbl>
18.859565   1.247705    0.4096202   3.046004    2.319046e-03    3.030462e-02
8.702231    -6.199963   1.5519239   -3.995017   6.468949e-05    4.932854e-03
9.466600    -1.535926   0.4899316   -3.134980   1.718657e-03    2.570514e-02
1099.496033 1.547162    0.3705798   4.174976    2.980168e-05    3.222408e-03

This has 426 rows in it! Then I perform normalisation, transformations, and plot a heatmap:

mat <- counts(dds, normalized = T)[rownames(sigs.df),]
mat
t(apply(mat,1, scale))
dds$condition <- factor(dds$condition, levels = c("Control","Blast"))



mat.z <- t(apply(mat,1, scale))
colnames(mat.z) = rownames(coldata)

mat.z
library(RColorBrewer)
bluegreen <- c("blue", "green") 
pal <- colorRampPalette(bluegreen)(100)
par(cex.main=.8)
heatmap(mat.z,cluster_rows = T, cluster_columns = T, column_labels = colnames(mat.z), name = "z-score", col = pal, legend = TRUE, 
main = "Heatmap of DEGS Normalized Counts in Pig Samples") 
The Output Heattmat is below.
Qu1: It seems to be only displaying a seclection of the genes (Rows labelled on right). How can I get it to display all the genes in detail?
[For thoose wondering, I havent mapped the Ensembl ID’s as there is an issue with Biomart & obtaining the scrofus gene ID’s !]
Qu2: I would like to annotate this with the conditions that each samples (bottom of heatmap) were exposed to. The Sample conditions & runs (Run oone and run 2) are held in the file ‘coldata’ but I am unable to get the heatmap to label/ annotate in this way.
I have seen people call a data frame to do this i./e”
df <- as.data.frame(file$sampleconditions)
then call this with pheatmap (annotation_row = df)..
However I cant seem to get this to work - should I be labelling my sample ID’s with the condition in the same file?
Thanks. Apologies for haphazardness (edited) 
:thread:
1



Rob Staruch
  5:10 PM
Rplot_Normalised_Counts_Pig_LF2C>1abs, PPadj<0005.png 
Rplot_Normalised_Counts_Pig_LF2C>1abs, PPadj<0005.png


:thread:
1

5:10
As an example of the above:
I want to add the annotation row labelling to a pheatmap.
It appears from the tutorial here: https://towardsdatascience.com/pheatmap-draws-pretty-heatmaps-483dab9a3cc
That I can call a data frame in order to do this.
Here is my data frame:

               Sample Condition
1    Sample_Run1HR62_S1_Run1    groupa
2    Sample_Run2HR62_S1_Run2    groupa
3    Sample_Run1HR70_S2_Run1    groupa
4    Sample_Run2HR70_S2_Run2    groupa
5    Sample_Run1HR78_S3_Run1    groupa
6    Sample_Run2HR78_S3_Run2    groupa
7    Sample_Run1HR81_S4_Run1    groupa
8    Sample_Run2HR81_S4_Run2    groupa
9    Sample_Run1HR87_S5_Run1    groupa
10   Sample_Run2HR87_S5_Run2    groupa
11   Sample_Run1HR99_S6_Run1    groupa
12   Sample_Run2HR99_S6_Run2    groupa
13  Sample_Run1HR107_S7_Run1    groupa
14  Sample_Run2HR107_S7_Run2    groupa
15  Sample_Run1HR114_S8_Run1    groupa
16  Sample_Run2HR114_S8_Run2    groupa
17 Sample_Run1HR142_S17_Run1    groupa
18 Sample_Run2HR142_S17_Run2    groupa
19 Sample_Run1HR146_S18_Run1    groupa
20 Sample_Run2HR146_S18_Run2    groupa
21   Sample_Run1HR61_S9_Run1    groupb
22   Sample_Run2HR61_S9_Run2    groupb
23  Sample_Run1HR71_S11_Run1    groupb
24  Sample_Run2HR71_S11_Run2    groupb
25  Sample_Run1HR74_S41_Run1    groupb
26  Sample_Run2HR74_S41_Run2    groupb
27  Sample_Run1HR80_S12_Run1    groupb
28  Sample_Run2HR80_S12_Run2    groupb
29  Sample_Run1HR86_S13_Run1    groupb
30  Sample_Run2HR86_S13_Run2    groupb
31 Sample_Run1HR115_S14_Run1    groupb
32 Sample_Run2HR115_S14_Run2    groupb
33 Sample_Run1HR121_S15_Run1    groupb
34 Sample_Run2HR121_S15_Run2    groupb
35 Sample_Run1HR127_S16_Run1    groupb
36 Sample_Run2HR127_S16_Run2    groupb
37  Sample_Run2HR66_S10_Run2    groupb
38  Sample_Run1HR66_S10_Run1    groupb
Here is the r script I am using to generate the Pheatmap:
# Create sample-sample heatmap
sampleDists <- dist(t(assay(rld))) #calculates Euclidean distance. Rld to ensure we have a roughly equal contribution from all genes
sampleDistMatrix <- as.matrix( sampleDists )
rownames(sampleDistMatrix) <- paste( targets$Sample, sep = " - " )
colnames(sampleDistMatrix) <- NULL
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
pheatmap(sampleDistMatrix, clustering_distance_rows = sampleDists, clustering_distance_cols = sampleDists,col = colors, main = "Heatmap of Sample to Sample Distances in Pig Samples" )
Here is the same code when I add the ‘annotation_row’ command:
# Create sample-sample heatmap
sampleDists <- dist(t(assay(rld))) #calculates Euclidean distance. Rld to ensure we have a roughly equal contribution from all genes
sampleDistMatrix <- as.matrix( sampleDists )
rownames(sampleDistMatrix) <- paste( targets$Sample, sep = " - " )
colnames(sampleDistMatrix) <- NULL
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
pheatmap(sampleDistMatrix, clustering_distance_rows = sampleDists, clustering_distance_cols = sampleDists,col = colors,annotation_row = targets, main = "Heatmap of Sample to Sample Distances in Pig Samples" )
Here is the error generated from this:
Error in check.length("fill") : 
  'gpar' element 'fill' must not be length 0
Any help would be greatly appreciated
  • 1
    Please, specifiy the R packages used in your code. – Marco Sandri Jul 16 '22 at 10:28
  • Please see Edit above. Thank you – Rob Staruch Jul 16 '22 at 11:08
  • 1
    Hi Rob. It's not clear how anyone can help you here unless you provide enough information to allow them to. Your set-up involves an object called `rld` which we don't have, and we have no way to guess what it might be. You also seem to be calling on functions from a non-CRAN package that few folks here will know about. In both cases, these seem unnecessary for the main point of the question, which is how to annotate pheatmap rows. Could you perhaps boil your question down to a simple reproducible example including all the library calls and data needed to run it? Thanks. – Allan Cameron Jul 16 '22 at 12:23
  • Hi Allan. See above. – Rob Staruch Jul 16 '22 at 14:30
  • Thanks Rob. So what is the `rld` object? It also seems to me very unlikely that you need _all_ of those library calls to reproduce the problem. These are fairly niche packages, and folks here would have to install all of them to run your code. Can you perhaps make a little toy example to demonstrate the problem you are having? – Allan Cameron Jul 16 '22 at 15:03
  • Hi Allan I dont know if my explanation above will help things. The old object is: rld <- vst(dds, blind = FALSE) it is a ? a transformation on the ads (deseq outfit file) – Rob Staruch Jul 16 '22 at 16:54

1 Answers1

1

In my opinion the error is due to a wrong format of the targets object specified in annotation_row.
Below I try to reproduce the error:

library(pheatmap)
library(RColorBrewer)

targets <- read.table(text="
Sample Group
1    Sample_Run1HR62_S1_Run1    groupa
2    Sample_Run2HR62_S1_Run2    groupa
3    Sample_Run1HR70_S2_Run1    groupa
4    Sample_Run2HR70_S2_Run2    groupa
5    Sample_Run1HR78_S3_Run1    groupa
6    Sample_Run2HR78_S3_Run2    groupa
7    Sample_Run1HR81_S4_Run1    groupa
8    Sample_Run2HR81_S4_Run2    groupa
9    Sample_Run1HR87_S5_Run1    groupa
10   Sample_Run2HR87_S5_Run2    groupa
11   Sample_Run1HR99_S6_Run1    groupa
12   Sample_Run2HR99_S6_Run2    groupa
13  Sample_Run1HR107_S7_Run1    groupa
14  Sample_Run2HR107_S7_Run2    groupa
15  Sample_Run1HR114_S8_Run1    groupa
16  Sample_Run2HR114_S8_Run2    groupa
17 Sample_Run1HR142_S17_Run1    groupa
18 Sample_Run2HR142_S17_Run2    groupa
19 Sample_Run1HR146_S18_Run1    groupa
20 Sample_Run2HR146_S18_Run2    groupa
21   Sample_Run1HR61_S9_Run1    groupb
22   Sample_Run2HR61_S9_Run2    groupb
23  Sample_Run1HR71_S11_Run1    groupb
24  Sample_Run2HR71_S11_Run2    groupb
25  Sample_Run1HR74_S41_Run1    groupb
26  Sample_Run2HR74_S41_Run2    groupb
27  Sample_Run1HR80_S12_Run1    groupb
28  Sample_Run2HR80_S12_Run2    groupb
29  Sample_Run1HR86_S13_Run1    groupb
30  Sample_Run2HR86_S13_Run2    groupb
31 Sample_Run1HR115_S14_Run1    groupb
32 Sample_Run2HR115_S14_Run2    groupb
33 Sample_Run1HR121_S15_Run1    groupb
34 Sample_Run2HR121_S15_Run2    groupb
35 Sample_Run1HR127_S16_Run1    groupb
36 Sample_Run2HR127_S16_Run2    groupb
37  Sample_Run2HR66_S10_Run2    groupb
38  Sample_Run1HR66_S10_Run1    groupb
", header=T)

# Generating a matrix for my example
rld <- matrix(rnorm(100*nr), ncol=nrow(targets))
sampleDists <- dist(t(rld)) 
sampleDistMatrix <- as.matrix(sampleDists)
rownames(sampleDistMatrix) <- paste(targets$Sample)
colnames(sampleDistMatrix) <- NULL
colors <- colorRampPalette(rev(brewer.pal(9, "Blues")))(255)

pheatmap(sampleDistMatrix, clustering_distance_rows = sampleDists,
         clustering_distance_cols = sampleDists, col = colors,
         annotation_row = targets, 
         main="Heatmap of Sample to Sample Distances in Pig Samples")

Here is the error:

Error in check.length("fill") : 'gpar' element 'fill' must not be length 0

To solve the problem, targets needs to be reformatted.
First, the rownames of targets must be the same of the sampleDistMatrix matrix.
In addition, targets must have only the Group column.

rownames(targets) <- rownames(sampleDistMatrix)
targets <- targets[, -1, drop=F]
str(target)

# 'data.frame':   38 obs. of  1 variable:
# $ Group: chr  "groupa" "groupa" "groupa" "groupa" ...

pheatmap(sampleDistMatrix, clustering_distance_rows = sampleDists,
         clustering_distance_cols = sampleDists, col = colors,
         annotation_row = targets, 
         main="Heatmap of Sample to Sample Distances in Pig Samples")

enter image description here

Marco Sandri
  • 23,289
  • 7
  • 54
  • 58
  • This is really helpful thank you. I have a question about this line: ``` targets <- targets[, -1, drop=F] ``` I understood the following brackets [] refers to: [ Rows, columns] with the row or column numbers to be selected being written as [1:5,1:5] (rows 1 two 5, columns 1 to 5). Does -1 imply remove a column - if so is it saying remove column 1 (therefore -2 would be remove column 2?). Thank you – Rob Staruch Jul 17 '22 at 12:29
  • Furthermore if I wanted the row names names in targets to be the same in sampldistmatrix would the line be: colnames(targets) <- row names (sampledismtmatrix). I would actually also like to add the run column from targets, so can I add that in the same fashion? – Rob Staruch Jul 17 '22 at 12:33
  • @RobStaruch Hi. `targets[, -1]` removes the first column of `targets` bacause you need only the `Group` column for row annotation. I am sorry but I did not understand your second question. Please, accept and upvote my answer above if you find it helful. – Marco Sandri Jul 18 '22 at 17:16