2

I'm working on a project to detect gaze direction using webcam feed in Python. Using the dlib library and OpenCV and using some masking techniques I can get a 60 fps feed of images of my eye like the one shown below.

eye

What I'm trying to do is find the coordinates of the center of the iris on every image as they come in and return it to the main method.

The approach that I'm following is dividing each of these frames in eight equal squares, calculating the proportion of black to white pixels inside each one, and multiplying the vector (direction) from the center of the frame to that square by that number.

Unfortunately, unless the pupil is relatively centered in the frame, this approach tends to overshoot significantly. I'm thinking of adding an oval mask around the frame to remove all the shadows, but wanted to see if the community has other ideas on how I can detect the center of the eye.

approach

Here's what I would like the output to be:

what I want

And here's what the current algorithm is outputting:

what I have

Any ideas are greatly appreciated!

ortunoa
  • 345
  • 4
  • 11
  • 1
    Here's an idea for approximating the iris position along the horizontal axis. What if you reduce the image to a row summing down (accumulating) every column? if you image has a big black blob (the iris), the minimum accumulated values in the row should correspond to the position of the iris on the horizontal axis. – stateMachine Nov 23 '22 at 03:07
  • 1
    So the idea is that it only accumulates when the pixels are white? – ortunoa Nov 23 '22 at 03:47
  • 1
    at that "resolution" (i.e. none at all), your only bet is ML. just throw the entire fistful of pixels into a regression (you receive angles or whatever). you can record training data easily if you make a target move around on the screen and you follow it with your eyes. – Christoph Rackwitz Nov 23 '22 at 11:54
  • 1
    I'm not even sure I would know how to feed this to a regressor, but thanks for the tip! – ortunoa Nov 23 '22 at 17:52
  • 1
    look for the usual hello-world, which is a neural network on the MNIST dataset, except those are classifiers (one-hot, softmax), not regressors (emitting independent analog values). if you can take centered crop of fixed aspect ratio (and maybe vary the scale to fit), that's almost equivalent. – Christoph Rackwitz Nov 23 '22 at 19:03

0 Answers0