I have a set of simple rigid 3D objects that I wish to detect and recognize from an image (let's say 5 to 10 classes). The objects are simple in sense that they are cylinders in one color or rectangles with simple patterns (stripes for example) or some similarly simple shape. The objects are significantly different from one another (there aren't for example two classes where one is a large cylinder and another one is the same but smaller cylinder). Because the textures are pretty simple (solids and/or simple patterns), bag-of-words approach fails (they do not contain significant number of unique edges).
While one possible approach is coding manually each classifier (manual feature extraction etc), is there a simple data driven approach (Haar/LBP classifier for example) that would work? If Haar or LBP are good for solving this problem, how would one solve the problem of unknown relative viewpoint (and by such perspective distortion, rotation, etc)? Would just providing positive images from all possible viewpoints for an object converge or is there something else that's usually done? The detection and recognition should run in real-time.