0

I want to do time series cluster analysis by using hclust or tsclust.

My data is GRDP data by year for a region.

the data is look like

area_name 2012 2013 2014 2015 2016 2017 2018 ..

area _ A value value value value value value value

area _ B value value value value value value value

area _ C value value value value value value value . . .

However, my data has too wide of a range. Each area has significant differences. for instance, area_ A is around 50000~ 55000. but area_C is around 3000~4000.

in this case, which scaling method do I have to use? I think 3 methods will be possible.

in case of Standardization, (z-score)

1. grand mean method mean for all data

enter image description here

2. year mean method mean for each year

enter image description here

3. area mean method mean for each area

enter image description here

which method will be possible and statistically valid ?

I need help. Thank you

  • 2
    [A word on standardization in longitudinal studies: don't](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4569815/) – zephryl Mar 21 '23 at 12:27
  • Please provide enough code so others can better understand or reproduce the problem. – Community Mar 21 '23 at 23:03

0 Answers0