0

I have a server that receives GPS data from a mobile device (iphone) and then finds the distinct cities from which the data comes for each distinct day. The app syncs every three hours. Since the granularity I need is not that small -I'm not interested in anything "smaller" than cities- I'd like to be able to say "this person was in this or that city during this or that date". The problem is, GPS warm-up, bad accuracy and the sheer amount of data received (the device collects data every ten minutes and syncs with the server every 3 hours) sometimes yield false positives or bad data -I have a user that lives near the border of nyc/nj and I keep getting alternate locations from one or the other place, even though he spends most of his actual day far from the border, so those times when he's home shouldn't matter).

My question is: what algorithms should I consider, what papers should I read, or, even, what terms should I google to find an approach that will help me get rid of the noise and false positives for data that's being synced every n hours and that needn't be more granular than a certain level (city, in this case) and which is significant for a certain period of time? (think about it as the fact that I'm counting the visits to distinct cities, states or countries for distinct dates). I was thinking of something like "clustering" or "dissolving" the data, but I don't know anything about geo algorithms, yet ;)

lfborjas
  • 296
  • 3
  • 15

1 Answers1

1

I use android instead of iphone so I'll answer hoping iphone has similar information available in its stream of fixes.

First, you should only use locations found from GPS and not from wifi or cell tower locations for this effort. In android, the source of the location fix is "GPS" "Network" or "cell", the GPS are the most accurate. Using cell towers, especially near a river or on a hill, you often pick up towers pretty distant from you as you move around, and it sounds like that's what's happening with your ny/nj problem.

2nd, if a guy is in a certain city he should be there for lengths of time without moving too far. You could write something that only declares a location if it receives a bunch of locations in a row that are within that location, that essentially filters out bad results by noting that it is not realistic to bounce back and forth quickly between two locations that are more than a 500 m apart or something like that.

mwengler
  • 2,738
  • 1
  • 19
  • 32
  • Thanks for your input, I was thinking about something like the 2nd approach too, but, sadly, I receive the info every 3 hours so, at the beginning of the day, my backend can't be sure if a person has been in a city enough to count. Also, I have no control over the iphone app, but will discuss with the guy that did it ;) And check this out: http://research.google.com/pubs/pub37522.html , it discusses the same problem! – lfborjas Jul 22 '12 at 23:32