significant differences between means

Question

Considering the picture below

enter image description here

each values X could be identified by the indeces X_g_s_d_h

g = group g=[1:5]
s = subject number (variable for each g)
d = day number (variable for each s)
h = hour h=[1:24]

so X_1_3_4_12 means that the value X is referred to the

12th hour 
of 4th day
of 3rd subject
of group 1

First I calculate the mean (hour by hour) over all the days of each subject. Doing that the index d disappear and each subject is represented by a vector containing 24 values.

X_g_s_h will be the mean over the days of a subject.

Then I calculate the mean (subject by subject) of all the subjects belonging to the same group resulting in X_g_h. Each group is represented by 1 vector of 24 values

Then I calculate the mean over the hours for each group resulting in X_g. Each group now is represented by 1 single value

I would like to see if the means X_g are significantly different between the groups.

Can you tell me what is the proper way?

ps

The number of subjects per group is different and it is also different the number of days for each subject. I have more than 2 groups

Thanks

ASantosRibeiro · Accepted Answer · 2014-11-12T16:14:27.520

Ok so I am posting an answer to summarize some of the problems you may have.

Same subjects in both groups

Not averaging:

1-First if we assume that you have only one measure that is repeated every hour for a certain amount of days, that is independent on which day you pick and each hour, then you can reshape your matrix into one column for each subject, per group and perform a ttest with repetitive measures.

2-If you cannot assume that your measure is independent on the hour, but is in day (lets say the concentration of a drug after administration that completely vanish before your next day measure), then you can make a ttest with repetitive measures for each hour (N hours), having a total of N tests.

3-If you cannot assume that your measure is independent on the day, but is in hour (lets say a measure for menstrual cycle, which we will assume stable at each day but vary between days), then you can make a ttest with repetitive measures for each day (M days), having a total of M tests.

4-If you cannot assume that your measure is independent on the day and hour, then you can make a ttest with repetitive measures for each day and hour, having a total of NXM tests.

Averaging:

In the cases where you cannot assume independence you can average the dependent variables, therefore removing the variance but also lowering you statistical power and interpretation.

In case 2, you can average the hours to have a mean concentration and perform a ttest with repetitive measures, therefore having only 1 test. Here you lost the information how it changed from hour 1 to N, and just tested whether the mean concentration between groups within the tested hours is different.

In case 3, you can average both hour and day, and test if for example the mean estrogen is higher in one group than in another, therefore having only 1 test. Again you lost information how it changed between the different days.

In case 4, you can average both hour and day, therefore having only 1 test. Again you lost information how it changed between the different hours and days.

NOT same subjects in both groups

Paired tests are not possible. Follow the same ideology but perform an unpaired test.

I hope I did not missed or said anything incorrect here. – ASantosRibeiro Nov 12 '14 at 16:14 — ASantosRibeiro, Nov 12 '14 at 16:14

Kostya · Answer 2 · 2014-11-12T13:23:28.663

0

You need to perform a statistical test for the null hypothesis H0 that the data in different groups comes from independent random samples from distributions with equal means. It's better to avoid sequential 'mean' operation, but just to regroup data on g. If you assume normality and independence of observations (as pointed out by @ASantosRibeiro below), that you can perform ttest (http://www.mathworks.nl/help/stats/ttest2.html)

clear all;
X = randn(6,5,4,3); %dummy data in g_s_d_h format
Y = reshape(X,5*4*3,6); %reshape data per group

h = zeros(6,6);
for i = 1 : 6 
    for j = 1 : 6
        h(i,j)=ttest2(Y(:,i),Y(:,j));
    end
end

If you want to take into account the different weights of the observations, you need to calculate t-value yourself (e.g., see here http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_ttest_a0000000126.htm)

edited Nov 12 '14 at 13:23

answered Nov 12 '14 at 12:24

Kostya

1,552
1
10
16

1

Be aware that you dont have independent samples and therefore ttest2 cannot be used with a simple reshape. I mean you have, for example, more than one acquisition per subject which will be more related than between subjects. Also, depending on the hours some acquisitions will be more related to have been both acquired at the morning, on the contrary to morning against evening. – ASantosRibeiro Nov 12 '14 at 12:29
thanks for the reply. my problem is this indeed...since I am computing mean of mean (several times) I don't know which statistical test to perform... – gabboshow Nov 12 '14 at 12:45
Sorry, don't have time now, if you're struggling with implementation i might be able to take another look tonight :) – Kostya Nov 12 '14 at 13:24
@Kostya Hi! I don't have an implementation problem (yet :)) my problem is more foundamental...which statistical test should I use for this problem? If you can have a look it would be great because I m really lost... – gabboshow Nov 12 '14 at 13:49
@Kostya I read your answer...I think that I cannot assume normality and independence – gabboshow Nov 12 '14 at 13:52
@gabboshow you can make a ttest between hour 1 and day 1 for both groups (you will get a vector for each group), and this should hold the independence premise. you can make this for every hour and day, and then correct for multiple comparisons. you can average the hours and days, therefore having a vector of sujects for each group and again performing a ttest. the independence premise should also hold true. – ASantosRibeiro Nov 12 '14 at 14:44
@ASantosRibeiro you suggested to make a "ttest between hour 1 and day 1 for both groups". You mean between hour 1 and subject 1? (after averaging all the days for each subject) – gabboshow Nov 12 '14 at 14:58
Or you do not average anything and just pick a vector of subjects from both groups within the same conditions (same hour same day) for the ttest, and therefore having N hours X M days tests, or you average all hours and all days and again have a vector of subject where you perform the ttest. Final note, the subjects are different in both grups right? – ASantosRibeiro Nov 12 '14 at 15:04
yes they number of subject for each group is different, but also the number of days for each subject. In principle also the number of hours is different (but I've put NaN for the missing hours) – gabboshow Nov 12 '14 at 15:08

significant differences between means

2 Answers2