R: Detect and discard variable with always the same value in long format data frame

Question

Say I have a data frame like the following, where one of the columns has always the same value:

> set.seed(1)
> mydf <- data.frame(name=LETTERS[1:10], treatment1=rnorm(10, 2, 1), treatment2=1.35, treatment3=rnorm(10, 5, 2))
> mydf
   name treatment1 treatment2 treatment3
1     A   1.373546       1.35  8.0235623
2     B   2.183643       1.35  5.7796865
3     C   1.164371       1.35  3.7575188
4     D   3.595281       1.35  0.5706002
5     E   2.329508       1.35  7.2498618
6     F   1.179532       1.35  4.9101328
7     G   2.487429       1.35  4.9676195
8     H   2.738325       1.35  6.8876724
9     I   2.575781       1.35  6.6424424
10    J   1.694612       1.35  6.1878026

In a short format data frame, I know how to detect and discard such column, by:

> mydf[sapply(mydf, function(x) length(unique(na.omit(x)))) == 1]
   treatment2
1        1.35
2        1.35
3        1.35
4        1.35
5        1.35
6        1.35
7        1.35
8        1.35
9        1.35
10       1.35

However, now I am facing the data frame in long format:

> mymelt <- melt(mydf, id.vars="name")
> mymelt
   name   variable     value
1     A treatment1 1.3735462
2     B treatment1 2.1836433
3     C treatment1 1.1643714
4     D treatment1 3.5952808
5     E treatment1 2.3295078
6     F treatment1 1.1795316
7     G treatment1 2.4874291
8     H treatment1 2.7383247
9     I treatment1 2.5757814
10    J treatment1 1.6946116
11    A treatment2 1.3500000
12    B treatment2 1.3500000
13    C treatment2 1.3500000
14    D treatment2 1.3500000
15    E treatment2 1.3500000
16    F treatment2 1.3500000
17    G treatment2 1.3500000
18    H treatment2 1.3500000
19    I treatment2 1.3500000
20    J treatment2 1.3500000
21    A treatment3 8.0235623
22    B treatment3 5.7796865
23    C treatment3 3.7575188
24    D treatment3 0.5706002
25    E treatment3 7.2498618
26    F treatment3 4.9101328
27    G treatment3 4.9676195
28    H treatment3 6.8876724
29    I treatment3 6.6424424
30    J treatment3 6.1878026

I do not want to have to dcast and melt again, is there a way to detect and drop treament2 from mymelt easily? (Mind that in my real data frame I have 2 variable columns that identify the treatment). Thanks!

I don't know beforehand which "treatment" it is, all I know is I want to discard "treatments" for which all values are the same... and I do have several "treatments" in my actual data — DaniCee, Apr 05 '18 at 03:49

score 0 · Answer 1 · answered Apr 05 '18 at 04:04

You can try the following:

library(dplyr)

# calculate count of unique values per group
df1 <- df %>% 
    group_by(variable) %>% 
    summarise(counts = n_distinct(value))

# get name of variable which has just one unique value
to_remove <- df1$variable[df1$counts == 1] # treatment2

# remove that value from the dataframe
df <- df[df$variable != to_remove, ]

   name   variable     value
1     A treatment1 1.3735462
2     B treatment1 2.1836433
3     C treatment1 1.1643714
4     D treatment1 3.5952808
5     E treatment1 2.3295078
6     F treatment1 1.1795316
7     G treatment1 2.4874291
8     H treatment1 2.7383247
9     I treatment1 2.5757814
10    J treatment1 1.6946116
21    A treatment3 8.0235623
22    B treatment3 5.7796865
23    C treatment3 3.7575188
24    D treatment3 0.5706002
25    E treatment3 7.2498618
26    F treatment3 4.9101328
27    G treatment3 4.9676195
28    H treatment3 6.8876724
29    I treatment3 6.6424424
30    J treatment3 6.1878026

R: Detect and discard variable with always the same value in long format data frame

1 Answers1