3

I'm working on a project with computer vision (opencv 2.4 on c++). On this project I'm trying to detect certain features to build a map (an internal representation) of the world around.

The information I have available is the camera pose (6D vector with 3 position and 3 angular values), calibration values (focal length, distortion, etc) and the features detected on the object being tracked (this features are basically the contour of the object but it doesn't really matter)

Since the camera pose, the position of the features and other variables are subject to errors, I want to model the object as a 3D probability density function (with the probability of finding the "object" on a given 3D point on space, this is important since each contour has a probability associated of how likely it is that it is an actually object-contour instead of a noise-contour(bear with me)).

Example: If the object were a sphere, I would detect a circle (contour). Since I know the camera pose, but have no depth information, the internal representation of that object should be a fuzzy cylinder (or a cone, if the camera's perspective is included but it's not relevant). If new information is available (new images from a different location) a new contour would be detected, with it's own fuzzy cylinder merged with previous data. Now we should have a region where the probability of finding the object is greater in some areas and weaker somewhere else. As new information is available, the model should converge to the original object shape.

I hope the idea is clear now.

This model should be able to:

  • Grow dynamically if needed.
  • Update efficiently as new observations are made (updating the probability inside making stronger the areas observed multiple times and weaker otherwise). Ideally the system should be able to update in real time.

Now the question: How can I do to computationally represent this kind of fuzzy information in such a way that I can perform these tasks on it?

Any suitable algorithm, data structure, c++ library or tool would help.

ButterDog
  • 5,115
  • 6
  • 43
  • 61
  • 2
    It's not clear exactly what configurations we need to be capable of representing, in part because it's not clear exactly what modification operations are allowed. The simplest general-purpose approach would be to simply use a 3D array of floating-point values to represent the probabilities that each voxel is present. Some kinds of updates would be faster if you also keep a separate floating point value that stores the sum of all elements in the array, and define `p(x, y, z) = array(x, y, z) / sum`. (This would also let you store all numbers as integers instead of FP.) – j_random_hacker Mar 19 '13 at 13:07
  • Basically I'm concerned on the likelihood of finding the object on a given region of space (and maybe interact with that object in the future). As you suggest, one option is to discretize the space volume as a 3D grid. The problem with that is that the computational cost to update the model grows exponentially as new data is available. The operations to perform on it would basically be **updating** the probabilities and finding dense regions on the volume – ButterDog Mar 19 '13 at 15:01
  • The only way to reduce the size of the representation is to exclude certain possible configurations from being represented exactly, and you haven't said what kinds of things you need to include/can afford to exclude. E.g. if you are happy to approximate the PDF as an ellipsoid, you could get away with recording just a handful of numeric parameters (the 2 foci, plus the radius, plus maybe another few parameters describing how the prob density ramps up from 0 around the surface), but it would be very hard to perform useful updates. – j_random_hacker Mar 19 '13 at 15:54
  • Basically the data could represent any arbitrary shape. On the example I talked about a sphere just to keep it simple. What I really have is a set of 2D contours of an arbitrary shape on a 3D space (projected on camera's axis since no depth info is available), each contour will have a probability of being valid or not and should map to a more or less dense zone on the 3D space. Thus, can't make any assumption on the geometry of the object to make any approximation as you say. Multiple observations should be able to be combined. Hope it's clearer now. – ButterDog Mar 19 '13 at 16:11
  • 2
    "Basically the data could represent any arbitrary shape" -- that's what I thought, and in that case, I don't see how you could even in theory do better than a 3D discretisation. I suppose you could represent the PDF as a sum of 1 or more 3-parameter functions, but then it would be very difficult to determine how to modify the representation to reflect an update. If you want to look into this route I would suggest googling "constructive solid geometry", but I honestly think it's more trouble than it's worth. – j_random_hacker Mar 19 '13 at 16:19

1 Answers1

1

I'll answer with the computer vision equivalent of Monty Python: "SLAM, SLAM, SLAM, SLAM!": :-) I'd suggest starting with Sebastian Thrun's tome.

However, there's older older work on the Bayesian side of active computer vision that's directly relevant to your question of geometry estimation, e.g. Whaite and Ferrie's seminal IEEE paper on uncertainty modeling (Waithe, P. and Ferrie, F. (1991). From uncertainty to visual exploration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(10):1038–1049.). For a more general (and perhaps mathematically neater) view on this subject, see also chapter 4 of D.J.C. MacKay's Ph.D. thesis.

Francesco Callari
  • 11,300
  • 2
  • 25
  • 40