0

I have written two algorithms to perform the Jacobi iterative method to simulate heat dissipation over a surface. I'd like to see if there is a significant difference between the run times of the two algorithms. I think that I can use a two-tailed statsmodels t-test to determine whether the means are not equal to one another, but I'd like to know if one is statistically significantly faster. How can I test to see which algorithm is statistically significantly faster?

Here is an example with the two-sample test.

from statsmodels.stats.weightstats import ttest_ind

# Example run times in seconds
algorithm_1_rtimes = [5, 5.5, 4.9]
algorithm_2_rtimes = [1.2, 1.1, 0.9]

_, pvalue, _ = ttest_ind(algorithm_1_rtimes, algorithm_2_rtimes)
if pvalue < 0.05:
  print("Reject H0")
else:
  print("Fail to reject H0")
Jared Frazier
  • 413
  • 1
  • 4
  • 10
  • See answer [here](https://stackoverflow.com/questions/15984221/how-to-perform-two-sample-one-tailed-t-test-with-numpy-scipy). There already exists a library function for this. – Jared Frazier Feb 15 '23 at 15:47
  • I'm not sure a t test is meaningful here. If the two functions are not identical, you know a priori that the run time must be different, and a t test will almost certainly reject the null hypothesis when the sample is large enough. Therefore a t test is essentially only telling you whether you have a large sample or not, and that's not very useful. This is a general, well-known limitation of t tests. My advice is to focus on the difference between mean run times (the so-called effect size) and consider whether difference is large enough for you to prefer one over the other. – Robert Dodier Feb 15 '23 at 19:10
  • @RobertDodier Thanks for the answer. I think I am perhaps confused on the limitation of the t-test then here. Because the two functions are explicitly not identical, does that automatically mean I violate an assumption (i.e., iid?) for the t-test? The reason I ask is it appears using statistical tests to compare runtime performances might have some validity (but i'm really not certain), see [here](https://www.researchgate.net/post/How_can_I_determine_whether_two_runtimes_are_significantly_different). – Jared Frazier Feb 15 '23 at 19:26
  • The test statistic has a standard deviation which decreases like 1/sqrt(n) where n is the sample size. When the actual difference in means is nonzero, the tail of the distribution of the test statistic must pull away from zero as n gets larger and larger, and no matter what significance level (i.e., tail mass) you choose, for a large enough n, the test statistic will almost always fall in the rejection region. This much is widely known and noncontroversial. – Robert Dodier Feb 15 '23 at 23:23
  • That's the behavior of the test whenever the actual difference in means is nonzero. Now, whether the test is still useful in any context is debatable -- I would say that an alternate approach via decision theory is preferable -- but in the specific case you mentioned, in which one can establish a prior that the means are different, it seems clearly inapplicable. People still apply t tests, sure, why not? One can still get a result and it doesn't look too implausible. – Robert Dodier Feb 15 '23 at 23:28
  • My advice about this specific case of comparing elapsed times is to report the mean and some indication of dispersion, such as the interquartile range, and look at the difference or ratio of means to judge how much faster one function is compared to the other. Whether the time difference is meaningful depends on non-statistical factors, such as the time and effort involved in rolling out a new version of the software, how much time is actually spent in the function versus all the boilerplate around it, etc. – Robert Dodier Feb 15 '23 at 23:33

0 Answers0