0

I am trying my first attempt on time series clustering and need some help. I have read about tsclust and dtwclust packages for time series clustering and decided to try dtwclust.

My data consist of temperature daily time series at different locations (one single value per day). I would like to group the different locations in spatial clusters from its temperature series. My very first attempt has been (just copied an example with options and put my data, temp.max3)

library(dtwclust)

hc<- tsclust(temp.max3, type = "h", k = 20L,
             preproc = zscore, seed = 899,
             distance = "sbd", centroid = shape_extraction,
             control = hierarchical_control(method = "average"))

But this gave me this error message

Error in stats::hclust(stats::as.dist(distmat), method, members = dots$members) : NA/NaN/Inf in foreign function call (arg 11)

I had to previously remove all NA present in any series, resulting temp.max3 dataframe does not contain any NA value.

summary(temp.max3)
      8025           8400A            8416            8455      
 Min.   : 6.40   Min.   : 4.60   Min.   : 6.00   Min.   : 4.00  
 1st Qu.:18.80   1st Qu.:17.40   1st Qu.:18.20   1st Qu.:19.00  
 Median :23.20   Median :22.00   Median :22.60   Median :24.00  
 Mean   :23.34   Mean   :22.23   Mean   :22.71   Mean   :23.67  
 3rd Qu.:28.20   3rd Qu.:27.40   3rd Qu.:27.40   3rd Qu.:29.00  
 Max.   :41.40   Max.   :40.60   Max.   :43.00   Max.   :42.00

Data looks like

head(temp.max3)
      8025 8400A 8416 8455
13127 16.0  14.0 13.5   14
13128 17.8  15.6 17.4   20
13129 18.2  15.2 19.2   18
13130 17.2  15.0 17.6   19
13131 17.0  13.8 15.6   17
13132 21.0  14.0 18.2   19

where 8025, 8400A, 8416 and 8455 are the station codes (just four by now but will extend to 120 for the final analysis). Data can be found on this dropbox link https://www.dropbox.com/s/xru4qnz8grhbxuo/data.csv?dl=0

Any idea, link to information or example will be greatly appreciated, thanks in advance

pacomet
  • 5,011
  • 12
  • 59
  • 111
  • 1. What about the values of shape_extraction? – Eric Lecoutre Dec 14 '17 at 09:45
  • 2. You want geospatial clustering; not ensured at all with this clustering approach -- maybe other methods would be more suitable – Eric Lecoutre Dec 14 '17 at 09:46
  • 3
    I have a feeling you have to transpose your data, `dtwclust` considers each row to be a time series. Try the following to debug: `proxy::dist(t(temp.max3), method="sbd")`. Also, directly call `zscore(t(temp.max3))` and see if it introduces NAs. – Alexis Dec 14 '17 at 10:11
  • Hi @EricLecoutre I don't know how to approach geospatial clustering for time series, I've done for single values. And I'm interested in joining locations with similar temperature behaviour, spatial location will be added later – pacomet Dec 14 '17 at 13:57
  • @Alexis You were right, transposing has done the job and analysis has finished (long calculation time for just four series). Now I'm gonna check the different parameters. Thanks – pacomet Dec 14 '17 at 13:58
  • 1
    So as for geospatial analyses, I realized that indeed R do not really offer solutions... Google those keywords ("python time series geospatial clustering) and you will find some solutions with Python. In a project, I used `pysal` package and was satisfied of it with max-p approach. Also found with that googling `clusterPy` and `thunder` (not tested) – Eric Lecoutre Dec 15 '17 at 16:56

1 Answers1

2

Thanks to the comment of Alexis the error message disappeared and the script run fine.

library(dtwclust)

temp.max4<-t(temp.max3)

hc<- tsclust(temp.max4, type = "h", k = 2L,
             preproc = zscore, seed = 899,
             distance = "sbd", centroid = shape_extraction,
             control = hierarchical_control(method = "average"))

with this output

enter image description here

Alexis, I'm sorry I can not accept the comment as the solution.

pacomet
  • 5,011
  • 12
  • 59
  • 111