2

I have a large time series data(1D floating point array) which represents various events. Similar events have similar phases. However, I don't know the number of events occurred during that time. Is it possible to write a program (preferably in python) to identify the similar phases which represent the same events(may be by coloring them).

Finally I want to plot the time series data with respect to time-stamps with each phase colored differently (based on their events).

Any help is more than appreciated.

Thanks

precision
  • 293
  • 2
  • 15
  • If you can identify the phases easily by looking at the data, it is probably possible to write a program to do so. What rules can you come up with to identify the different phases? Can you narrow down the question with some sample data and identify what steps are causing you trouble? – Justin Aug 25 '14 at 20:51
  • Is this question closely related to your other question, http://stackoverflow.com/questions/25344895/ordered-colored-plot-after-clustering-using-python? – Justin Aug 25 '14 at 20:54
  • @Justin, It's not possible to identify the data by looking at it and there are some sudden rapid changes which are outliers and need to be eliminated. For Instance data=[.04 .05 .06 4.3 3.2 .01 .03 1.2 1.5 1.6]. Here we can see 4.3, 3.2 are outliers and should be eliminated. Yes I tried k-means with the collected data set and it's unable to extract phases.Signal amplitude can be a measure of phase change.In the exmaple data there should be two phases.Please let me know if you want more details. Thnaks! – precision Aug 25 '14 at 21:12
  • This sounds more like a 1-D signal processing problem than a clustering/data-mining problem to me, although, having said that, you've included very limited information on the nature of the signal which makes it difficult to tell -- or help you. – Tom Morris Aug 26 '14 at 14:55

1 Answers1

2

Sounds like you might need to use a clustering algorithm to figure out where one group ends and the other begins. K-means is dead-simple, and, if you have experience with Python, you can probably write yourself up an implementation within a few hours.

Fortunately, the people behind scikit have already provided some fantastic implementations. One of those will probably fit your needs. Again, k-means is the simplest, and you might want to start with that until you get a feel for it.

Patrick Collins
  • 10,306
  • 5
  • 30
  • 69