0

Suppose that we have the following time series:

X1 = {(0, 3), (1, 4), (3, 5)}  
X2 = {(0, 3), (1, 4), (2, 6) (3, 5), (4, 8)} 
X3 = {(0, 3), (1, 4), (2, 6) (3, 5), (4, 8), (5, 9)}

where the first element of the tuple represents the time, and the second element represents the value taken at that time (e.g. temperature measured at certain point in time).

What would be an efficient way of finding out the missing timestamps of X1 comparing to X2 and X3 (e.g. the missing timestamps of X1 compared to X2 and X3 are: (2,_), (4,_), (5,_) ), and then using linear interpolation to put some values for the timestamps that were missing in X1? What kind of data structure would you use, and how would you look for the missing values knowing the timestamp and then apply linear interpolation between two points (as output, I am supposed to see X1 with all values)?

Marco13
  • 53,703
  • 9
  • 80
  • 159
  • I'm not sure why this was downvoted. It seems like a reasonable question to me. You might want to add information about the data structures that you are already using. What are the *types* of `X1`, `X2` and `X3`? Are they arrays, or lists, maybe something like a `List` with a class `Entry` that has a `getTime()` and a `getValue()` method? Are the times and values `int` or `double` values? (This is important for the interpolation!) ... – Marco13 Jan 15 '17 at 15:39
  • @Marco13 Hello Marco! First of all, thank you very much for writing here. Now, the problem is defined differently. There is given an array which consists all the timestamps that are supposed to be present in all time series. Now, I will have to compare all the time series with the array consisting all the timestamps, and thus recovering the time series with the missing values. Then I will put the missing values of the timestamps using linear interpolation between two points Would it be OK using linear interpolation in this context? Thanks a lot for any suggestion! –  Jan 28 '17 at 17:43
  • Whether or not linear interpolation would be OK depends on the domain and nature of the data. Beyond that, the *core* of the question is now less clear than before. If this is about an efficient *implementation*, then you have to tell us more about the data structures. If this is a conceptual question, then the question might be better suited for other stackexchange sites, and in any case, you would have to say more about the domain and what the time series actually represent. – Marco13 Jan 28 '17 at 19:11
  • @Marco13 yes, you are right. I was not clear. Actually, I have to implement Centroid Decomposition (a matrix decomposition technique) method into Amazon data, where the time series represent all the ratings(1-5) given to certain products, and the time(unix epoch time) when those ratings were given. In fact, I have to provide similar solution as done in this paper: [link](https://cseweb.ucsd.edu/~jmcauley/pdfs/www16a.pdf), but instead I have to use different technique, Centroid Decomposition technique. –  Jan 30 '17 at 22:18
  • This may be an entry point to a discussion that eventually will be beyond the scope of these comments, but wonder whether it makes sense: Between (1,3stars) and (3,5stars) would be an interpolation point of (2,4stars) - how should this be justified? There are no "2.5 stars" either. (I looked at the paper, but don't have the time to read it thoroughly now). It is still not clear to me whether this is a conceptual question or an implementation question. If it is an implementation question, you should provide more code showing your current apporach. – Marco13 Jan 31 '17 at 11:24

1 Answers1

0

I wouldn't worry about efficiency until I had something that worked at all.

Linear interpolation might not be a good idea. There are lots of interpolation schemes. You ought to look into some higher order or spline schemes.

If you have to have a common set of timestamps for each point, you'll have to start by iterating over all of them and collecting a set of timestamps. Once you have them you'll execute your interpolation scheme on each one that needs data.

This isn't a trivial problem. You should do more research to find things like this.

duffymo
  • 305,152
  • 44
  • 369
  • 561
  • I don't see how this should be considered as an answer to the question...?! – Marco13 Jan 15 '17 at 15:40
  • Are you expecting code? I laid out the steps, pointed out that linear interpolation might not be the way to go, and provided another answer that dealt with the issue in greater detail. It's certainly more than you've provided - nothing. – duffymo Jan 15 '17 at 15:55
  • I cleaned up the question and asked for some clarification and further details in the comments. I consider this as necessary steps for making the question a *good* question and for providing a *good* answer. Let's see how it turns out. – Marco13 Jan 15 '17 at 16:03
  • @duffy, thanks a lot to both of you for writing. It helped to think more about the problem. Now, I decided to define the problem differently (as described in the comment above) as I see that that way would be more efficient and more reasonable. Thank! Best regards, Leo –  Jan 28 '17 at 17:49