0

In below post of Analytics Vidya, ANOVA test has been performed on COVID data, to check whether the difference in posotive cases of denser region is statistically significant.

I believe ANOVA test can’t be performed on this COVID time series data, atleast not in way as it has been done in this post. Sample data has been consider randomly from different groups(denser1, denser2…denser4). The data is time series so it is more likely that number of positive cases in random sample of groups will be from different point of time. There might be the case denser1 has random data from early covid time and another region has random data from another point of time. If this is the case, then F-Statistics will high certainly.

Can anyone explain if you have other opinions?

https://www.analyticsvidhya.com/blog/2020/06/introduction-anova-statistics-data-science-covid-python/

1 Answers1

0

ANOVA should not be applied to time-series data, as the independence assumption is violated. The issue with independence is that days tend to correlate very highly. For example, if you know that today you have 1400 positive cases, you would expect tomorrow to have a similar number of positive cases, regardless of any underlying trends.

It sounds like you're trying to determine causality of different treatments (ie mask mandates or other restrictions etc) and their effects on positive cases. The best way to infer causality is usually to perform A-B testing, but obviously in this case it would not be reasonable to give different populations different treatments. One method that is good for going back and retro-actively inferring causality is called "synthetic control".

https://economics.mit.edu/files/17847

Above is linked a basic paper on the methodology. The hard part of this analysis will be in constructing synthetic counterfactuals or "controls" to test your actual population against.

If this is not what you're looking for, please reply with a clarifying question, but I think this should be an appropriate method that is well-suited to studying time-series data.

Jake144
  • 11
  • 1