0

I’m trying to find the variance of a subset of the whole data (dat) “pollutionData.csv”.

I want the variance of the PM2.5 levels when rain = 0.

var(PM2.5, data=subset(dat, RAIN == 0))

The code above isn’t working.

aggregate(dat[, 6], list(dat$RAIN==0), var, na.rm=TRUE)

The code above outputs the variance when the rain = 0 and when rain > 0, but I want to do a hypothesis test for the variances so this isn’t helpful.

Any help would be appreciated!

Kai Whelan
  • 13
  • 1

1 Answers1

0

We can subset the 'PM2.5' where 'RAIN' is 0 and then take the var

with(dat, var(PM2.5[RAIN == 0], na.rm = TRUE))

Another option is to replace the values in 'PM2.5 where 'RAIN' is 0 with NA and then apply var

with(dat, var(replace(PM2.5,  RAIN == 0, NA), na.rm = TRUE))

aggregate is required when we want a group by operation. Here, we are only getting the var of the 'PM2.5` where the 'RAIN' is 0

akrun
  • 874,273
  • 37
  • 540
  • 662