9

You are a plane tracking an enemy ship that travels across the ocean, so you have collected a series of (x,y,time) coordinates of the ship. You know that a hidden submarine travels with the ship to protect it, but while there is a correlation between their positions, the submarine often wanders off from the ship, so while it's often near it, it can also be on the other side of the world occasionally. You want to predict the path of the submarine, but unfortunately it is hidden from you.

But one month in April you notice the submarine forgets to hide itself, so you have a series of coordinates for both the submarine and the ship throughout 1,000 trips. Using this data, you'd like to build a model to predict the hidden submarine's path given just the ship's movements. The naive baseline would be to say "submarine position guess = "ship's current position" but from the April data where the submarine was visible, you notice there is a tendency for the submarine to be ahead of the ship a bit, so "submarine position guess = ship's position in 1 minute" is an even better estimate. Furthermore, the April data shows that when the ship pauses in the water for an extended period, the submarine is likely to be far away patrolling the coastal waters. There are other patterns of course.

How would you build this model, given the April data as training data, to predict the submarine's path? My current solution is an ad-hoc linear regression where the factors are "trip time", "cargo ship's x coordinate", "was cargo ship idle for 1 day", etc. and then having R figure out the weights and doing a cross-validation. But I would really love a way to generate these factors automatically from the April data. Also, a model that uses sequence or time would be nice, since the linear regression doesn't and I think it's relevant.

Edit: I've reformulated the problem with a made-up story so it's a less confusing. The original problem I posted is:

I have eye-tracking data on two subjects -- a teacher, and a student. It's in the form (x, y, time), so there is a series of these for each subject. What the teacher looks at influences what the student looks at. What method would I use to predict what the student is looking at, using only teacher data? Lets say I can train some learning algorithm using a gold standard set of student and teacher data.

I was thinking hidden markov model would be appropriate, given the definition in Wikipedia, but I am not sure how to put this into practice over my dataset.

More detail: I have data about how a teacher and student each look at a map and some readings. I have 40 of these datasets, which look like [(366,234,0), (386,234,5), ...] which means the teacher looked at point (366,234) at time 0 and then 5 seconds later moved up to look at coordinate (386, 234). I can to learn a model to understand the relationship between how a teacher looks at content, to predict how a student will look at the same content. So maybe the student looks at the content in the same order as the teacher but slower. Or perhaps the student doesn't look around as much but the teacher scans more of the content. I have both sets of data and want to see how accurate of a model I can get -- would I be able to predict the student's looking behavior within 50px of the teacher's looking behavior?

user2077851
  • 91
  • 1
  • 3
  • It seems that each session of "map-gazing" can be thought of as a directed graph G(V,E) where each v of Vertices is an (x,y) coordinates where gaze was directed for more than some threshold amount of time (points of interest in the map). Edges E represent eye movement as well as time ordering in their directionality. So, given a teacher's graph, you need to find a student's graph. Is that correct? If so, then you could simplify training data into such graphs and learn their parameters. Or: given a point (x,y) the teacher is looking at, you need to guess (x,y) where the student is looking? – Alptigin Jalayr Feb 16 '13 at 23:05
  • Alptigin, yes that is exactly what I am trying to do. Although there is also a time dimension to the graph I guess. Do you know any way to train something to generate a student graph from a teacher graph? – user2077851 Feb 17 '13 at 02:49
  • Well, I'd say first simplify your dataset into these graphs, visualize them. I can't say offhand what specific method will be successful. – Alptigin Jalayr Feb 17 '13 at 22:34

1 Answers1

4

I'd suggest looking at Kalman Filters, or, more generally, state-space models (SSMs), which are defined by the book recommended below as "just like an HMM, except the hidden states are continuous".

I can recommend a book chapter on the topic - chapter 18 in Kevin P. Murphy's "Machine Learning: a Probabilistic Approach"; there are also online resources (lookup Kalman filters), but I can't recommend any specific one.

EDIT: you can find here references for using Kalman filters with R to predict time-series.

Hope this helps,

Community
  • 1
  • 1
etov
  • 2,972
  • 2
  • 22
  • 36
  • Thank you etov, I have thought about Kalman filters before but it seems like it is used for predicting the next steps in a series, rather than for predicting an entire second series. Could you point me to how I would use it for prediction on a second time-series? – user2077851 Feb 18 '13 at 00:09
  • Also, would they be making the markov assumption, that only the most recent state affects the future states? – user2077851 Feb 18 '13 at 07:33
  • The intuition is that you can treat the ship's position as a noisy measurement of the submarine's position. the noise isn't necessarily white - it can have various effects, based on past events. So basically, predicting the submarine's position is analogous to estimating the hidden model parameters. I'm not sure it's as general as all cases you refer to, but I think it can cover at least some of them. – etov Feb 18 '13 at 08:47