0

I am not sure why "na.rm=TRUE" in ggplot() and geom_col() don't remove the missing. Here is the input file (file name in the code is dat.csv ):

br,tr,obs,ee,dd
UU,RRR,228,0.38895,0.33691
BB,RRR,591,0.37254,0.40899
GG,RRR,702,0.36163,0.38155
UU,AAA,229,0.31594,0.32768
BB,AAA,591,0.18185,0.23339
GG,AAA,702,0.37287,0.40218
UU,BBB,228,0.16561,0.32313
BB,BBB,591,0.22578,0.41145
GG,BBB,702,0.28103,0.46357
UU,LLL,1,.,.
UU,TTT,107,-0.01136,0.2265
BB,TTT,33,-0.34362,0.07749
GG,TTT,54,0.00905,-0.07037

Here is the code

library("data.table")
library(conflicted)
library(tidyverse)
library(ggplot2)

ee <- fread("dat.csv", select = c("br","tr","obs","ee"))
ee_r <- data.frame(a = 1:nrow(ee), Method = "OLD")
ee <- cbind(rename(ee, Acc = ee), ee_r)

dd <- fread("dat.csv", select = c("br","tr","obs","dd"))
dd_r <- data.frame(a = 1:nrow(dd), Method = "NEW")
dd <- cbind(rename(dd, Acc = dd), dd_r)

dat <- rbind(ee,dd)
dat <- subset(dat, select = -c(a))

dat$Acc[dat$Acc=="."] <- NA

br_nbs <- paste(dat$br, dat$obs, sep = "\n")
br_nbs

#data <- subset(dat, !is.na(Acc)) This command gives me error

ggplot(dat, aes(x = br_nbs, y = Acc, fill = Method), na.rm=TRUE)+
  geom_col(colour="black",width=1, position=position_dodge(0.7), na.rm=TRUE) +
  facet_wrap(~tr, strip.position = "top", labeller = "label_value", scales = "free_x")

And here is the plot with NAs in the red box:

enter image description here

I appreciate any comments.

Thanks.

  • `ggplot()` and `geom_xxx()` do not have an `na.rm` parameter. Filter the data to remove the `NA`s before piping to `ggplot()`. And please remove all the data wrangling prior to the call to `ggplot()`. It's unnecessary if you you give us the output from `dput(dat)`. That what the *minimal* part of minimal reproducible example means. – Limey Aug 09 '22 at 15:38
  • @Limey, [`geom_col`](https://ggplot2.tidyverse.org/reference/geom_bar.html) _does_ have `na.rm=` as an argument. – r2evans Aug 09 '22 at 15:44
  • 1
    @r2evans I stand corrected. Thank you. But OP is also using it in the call to `ggplot()`... – Limey Aug 09 '22 at 15:46
  • @user19642384, _please_ provide reproducible data. It's not clear to me if we how we should be combining the top data block with the code below it that does not seem to work. – r2evans Aug 09 '22 at 15:48
  • 1
    Looking at the plot, your problem is that the `Acc` column is a character or factor variable rather than numeric. Try `dat$Acc <- as.numeric(as.character(dat$Acc))` and run your plotting code again. – Allan Cameron Aug 09 '22 at 15:57
  • @AllanCameron In addition, OP creates their `br_nbs` vector from the full dataset and then filters the full dataset to remove `NA`s. Thus, there will be a length mismatch when they attempt to create the plot. – Limey Aug 09 '22 at 16:01

1 Answers1

0

Like @r2evans, I am unsure what your input data actually is. I think it might be this:

dput(dat)
structure(list(br = c("UU", "BB", "GG", "UU", "BB", "GG", "UU", 
"BB", "GG", "UU", "UU", "BB", "GG", "UU", "BB", "GG", "UU", "BB", 
"GG", "UU", "BB", "GG", "UU", "UU", "BB", "GG"), tr = c("RRR", 
"RRR", "RRR", "AAA", "AAA", "AAA", "BBB", "BBB", "BBB", "LLL", 
"TTT", "TTT", "TTT", "RRR", "RRR", "RRR", "AAA", "AAA", "AAA", 
"BBB", "BBB", "BBB", "LLL", "TTT", "TTT", "TTT"), obs = c(228L, 
591L, 702L, 229L, 591L, 702L, 228L, 591L, 702L, 1L, 107L, 33L, 
54L, 228L, 591L, 702L, 229L, 591L, 702L, 228L, 591L, 702L, 1L, 
107L, 33L, 54L), Acc = c("0.38895", "0.37254", "0.36163", "0.31594", 
"0.18185", "0.37287", "0.16561", "0.22578", "0.28103", ".", "-0.01136", 
"-0.34362", "0.00905", "0.33691", "0.40899", "0.38155", "0.32768", 
"0.23339", "0.40218", "0.32313", "0.41145", "0.46357", ".", "0.2265", 
"0.07749", "-0.07037"), Method = c("OLD", "OLD", "OLD", "OLD", 
"OLD", "OLD", "OLD", "OLD", "OLD", "OLD", "OLD", "OLD", "OLD", 
"NEW", "NEW", "NEW", "NEW", "NEW", "NEW", "NEW", "NEW", "NEW", 
"NEW", "NEW", "NEW", "NEW")), row.names = c(NA, 26L), class = "data.frame")

Now, finish off your data wranging and, critically, add br_nbs to the data frame.

dat$br_nbs <- paste(dat$br, dat$obs, sep = "\n")
dat$Acc[dat$Acc=="."] <- NA

And plot the data, with minor modifications to your code.

ggplot(dat %>% filter(!is.na(Acc)), aes(x = br_nbs, y = Acc, fill = Method))+
  geom_col(colour="black",width=1, position=position_dodge(0.7)) +
  facet_wrap(~tr, strip.position = "top", labeller = "label_value", scales = "free_x")

Giving

enter image description here

Limey
  • 10,234
  • 2
  • 12
  • 32