3

My question is similar to this. I need data struture to store and access large amount of time series data. In my case insert rate is very hight - 10-100k inserts per second. Data items is a tuples that contains timestamp, sensor id and sensor value. And I have very large number of sensors. In my case values that is older than some point in time must be erased.

I need to query dataset by sensor id and time range. All the data must be stored in external memory, there is no way to fit it in main memory.

I know about TSB-tree already, but TSB-tree is hard to implement and there is no guarantee that it will do the job. I suspect that TSB-tree doesn't behave very good under high insert rate.

Is there any alternative? Maybe something like LSM-tree but for multidimentional data?

Community
  • 1
  • 1
Evgeny Lazin
  • 9,193
  • 6
  • 47
  • 83
  • You need to specify whether the "external memory" is random access or sequential access. – Tyler Durden Jun 06 '13 at 21:28
  • "I need to query dataset by sensor id and time range." - This is imprecise. Which of these queries do you want to support: (1) tuples where SID=X, (2) tuples where TMIN <= T <= TMAX, (3) tuples where SID=X AND TMIN <= T <= TMAX? - Your wording could mean "(1) and (2)," or "(3)," or "(1), (2), and (3)." – Timothy Shields Jun 06 '13 at 23:38
  • I want to support (1), (3) and (4) tuples where SID in (X1, X2, X3) AND TMIN <= T <= TMAX. – Evgeny Lazin Jun 07 '13 at 08:36

1 Answers1

3

Because you're using external memory, you may want to read through the chapter on B-trees in Henrik Jonsson's thesis - B-trees themselves are a very popular way to index data in external memory and you should be able to find implementations in any language, and Jonnson discusses how to adapt them to store time series data.

Zim-Zam O'Pootertoot
  • 17,888
  • 4
  • 41
  • 69