I have a server that receives GPS data from a mobile device (iphone) and then finds the distinct cities from which the data comes for each distinct day. The app syncs every three hours. Since the granularity I need is not that small -I'm not interested in anything "smaller" than cities- I'd like to be able to say "this person was in this or that city during this or that date". The problem is, GPS warm-up, bad accuracy and the sheer amount of data received (the device collects data every ten minutes and syncs with the server every 3 hours) sometimes yield false positives or bad data -I have a user that lives near the border of nyc/nj and I keep getting alternate locations from one or the other place, even though he spends most of his actual day far from the border, so those times when he's home shouldn't matter).
My question is: what algorithms should I consider, what papers should I read, or, even, what terms should I google to find an approach that will help me get rid of the noise and false positives for data that's being synced every n hours and that needn't be more granular than a certain level (city, in this case) and which is significant for a certain period of time? (think about it as the fact that I'm counting the visits to distinct cities, states or countries for distinct dates). I was thinking of something like "clustering" or "dissolving" the data, but I don't know anything about geo algorithms, yet ;)