0

I have a large melt output - 4608940, 2, comprising 1000 columns with ca. 4000+ rows. The variable column entries do not have the same number of points.

Is there a way to select certain data within the melt to use with ggplot2/boxplot()? Say column 50, column 130, col 650?

Easily done using r's base boxplot() and the original data.

Screamh
  • 39
  • 5
  • One can always select columns in R using `dat[,c(50,130,650)]` or `subset(dat, select=c(50,130,650))` or `dplyr::select(dat, c(50,130,650))`. – r2evans Jul 06 '21 at 11:47
  • If you are feeding the entire data to `ggplot2` but only want the boxplot layer to use specific columns, then `geom_boxplot(data = ~ subset(., select=c(50,130,650)))` might work for you. – r2evans Jul 06 '21 at 11:47

1 Answers1

2
# Get some data (1000 columns, 4000 rows)
df<-data.table(sapply(seq(1,1000), function(x) rnorm(4000)))

# Melt the data (result is 4,000,000 x 2)
plot_input = melt(df, id.vars =NULL, measure.vars=colnames(df), variable.name = "col_num", value.name = "value")

# boxplots of selected columns
ggplot(
    plot_input[col_num %in% c("V50", "V130", "V650")],
    aes(y=value, x=col_num, color=col_num)) + 
geom_boxplot() + 
theme(legend.position="none") + labs(x="Column", y="Value")

boxplots of selected columns from melt

langtang
  • 22,248
  • 1
  • 12
  • 27
  • r2evans, langtang - thank you very much for sharing your insights and understanding. – Screamh Jul 06 '21 at 14:15
  • Hi @langtang, I am getting an unusual error - I ran your answer exactly as you gave it , it worked perfectly. Over the last few days the following error has arisen `-Error in col_num %in% c("V50", "V130", "V650") : object 'col_num' not found. ` My R has not updated, I have not made any changes to your answer. Any idea as to what has happened?Thx – Screamh Jul 09 '21 at 15:22
  • I have just ran your answer on a windows machine, it works perfectly. So it must be a Mac issue. V strange – Screamh Jul 09 '21 at 15:30
  • @Screamh you don't have to melt the entire frame either. For example, you can select the columns you want prior to the melt like this: `plot_input = melt(df[,.(V50,V130,V650)], id.vars =NULL, measure.vars=NULL, variable.name = "col_num", value.name = "value")` – langtang Jul 09 '21 at 21:38
  • Thx langtang, am still v new to r, and didn't know this. Much appreciated. – Screamh Jul 12 '21 at 06:52