I have a data.table with a bunch of parameters (amplitude, rate, area, etc..there are 23 in total) that belong to specific wells (singular experiment, if you will, there are 48 in total), grouped by treatments (there are usually ~10 in total), and all of this is at different time points (there can be many). I would like to first take each well and normalize (as in, divide) all the parameters by the median parameters at baseline (all time points before "zero" time), and then take that normalized data and normalize it again, but this time by control treatment group, for each time point. I would also like to take a look at the baseline and control data beforehand and flag and remove outliers, if necessary, prior normalization (although this is not extremely important at the moment; I can probably figure this out once I realize how to accomplish the normalizations)
As an example, I will create a similar data.table to what I am generating in my raw instrument data analysis code:
dt = data.table(
wellID = as.factor(c ("A4", "B4", "C5", "D5", "A4", "B4", "C5", "D5","A4",
"B4", "C5", "D5")),
treatment = as.factor (c ("Control", "Control", "Drug", "Drug", "Control",
"Control", "Drug", "Drug", "Control", "Control", "Drug", "Drug")),
time_h = c (-0.2, -0.2, -0.2, -0.2, -0.1, -0.1, -0.1, -0.1, 4, 4, 4, 4),
area = runif (12, min = 0.5, max = 0.9),
amp = runif (12, min = 0.1, max = 0.2),
rate = runif (12, min = 33, max = 38)
)
I tried things like:
baseline = subset (dt, subset = time_h < 0 )
to isolate the baseline timepoints, and then:
base_medians = by (baseline [ , (4: ncol (baseline)) ], baseline$ wellID,
function (x) {
apply (x, 2, median)
})
to get the baseline medians for each well, but then I don't really know how to normalize the data in dt so that the wells and the parameters are matched, and then the second normalization?
I don't think this a good strategy anyhow, should I be deconstructing and reconstructing my dataset somehow?
Any help is appreciated!