So I have decided that the OpenCV function cvMatchTemplate() is just far too slow. Everything is too jerky, and just has no awesomeness. I decided to take it from the program, and work on the general jerkiness of everything. I figured if I average everything over the last 10 or so values, then the estimates will be smoothed out.

When initialising the centre eye position, I considered 20 estimates (more would be better, but also take longer), took the standard deviation, and then threw out everything that was 2 standard deviations away from the mean. I then took the mean of the remaining estimates for the overall estimate.

For the eye estimates, I considered the last 10 estimates. I would consider more, but then we get the old eye positions influencing new ones. I didn’t use standard deviations, as if they eye does move, it might move outside of a standard deviation. I just used a simple mean. The same for gaze location.

So here is a video of it at work. Once again there is no match template, so we can’t really look at where the gaze estimate is, but more at how smoothly it moves. The estimate lags, but never mind!

I also looked a bit at the calibration process. Up till now I had just been guessing the “eye radius”… this is what I called the distance the eye moves to look from one side of the screen to the other. But guessing is never acceptable. Ever. So I devised a simple calibration process. First the person looks to the centre, then the left, then the right. Through all this the computer is finding the average distance moved, (throwing out all the points 2 standard deviations away etc). This gives us a good guess at the eye radius.

I have decided to use the OpenCV template matching function, and just using it in Square Difference mode. It is not amazingly efficient, but its not bad.

It works quite well, I have made the box around the eyes stick in place, on the face, and so now I really just need to make everything more efficient and accurate. Then I can do a bit of maths and work out some estimates for where the person is looking. Hopefully.

When it is a bit better I will post a video, but for now here are some pictures of the template matching working:

As the title clearly states, I have used the powers of OpenCV to make some eye tracking happen in real time. I then used it to make a VERY rough estimate of the gaze location.

So I open up a stream from the webcam, and then draw around the general eye area. This then remains fixed in space and does not move. The eyes are then located using my method. When the person is looking at the center of the screen, a button is pressed and this initialises the initial eye position. From then I just calculated the approximate eye displacement, and used the proportion displaced to relate to the proportion of the screen that the gaze had travelled. Please note that this is the most inaccurate thing ever, and will be fixed as soon as I have the time and motivation. Ok I lied, its not the very biggest problem…

The fact that the box around the eye area does not move is the BIGGEST problem! This means that if your head moves at all, the computer wont know that it was your head and not your eye. and it is very easy for your head to twitch by the radius of your eye. I will fix this using the power of template matching. Maybe. I still need to experiment. My plan is to take the initial box around the eye area, and use it as a template, and make it stay fixed on the face, rather than the webcam image. If that makes sense.

Once the eye box is fixed on the face, we can use that to tell us how the face moves, and the eye positions to tell us how the eyes move, and then all we need is some tasty maths!

Some few of you may recall one of my previous posts (Success! Joe’s new method.) I outlined and implemented a technique to detect circles. My idea was to consider pairs of adjacent pixels, and calculate where their gradients intersected, thus giving an approximate centre for any dark circles they lay on. Well, I was reading a paper recently, and it outlined a very similar method, but instead of considering two pixels, you considered only one (something that I had assumed was possible, but had no idea how to do).

First the paper introduced the concept of an isophote. This is simply an area of constant darkness/intensity. So if our image is described by a function f(x,y), then we have an isophote at (x,y) where f(x,y) = f0 (f0 is constant). This is essentially describing contours. It then goes on to use the curvature of each pixel, assuming it is on an isophote, to estimate the radius of the circle it is on. Curvature is the second derivative of y with respect to x, and a formula for this can be derived from the definition of an isophote using total derivatives (as f = a constant, we have y(x), so we must use the total derivative, paying attention to the chain rule). The radius of the circle that it is on is given by 1/curvature, which is very convenient. To convince yourself of this, try plugging the values x = 0, y = r into the definition of curvature, and you should get 1/r. Easy. So now we have the distance to the centre. The direction to the centre is given by the gradient at that point. All it takes is a little trigonometry, and we have an estimate!

Remember before when I commented on how slow it would be to draw a large “bump” onto an image at every estimate? Well I did even if you don’t. It seems that Gaussian blur (again discussed in an earlier post) offers a perfect solution. If I draw every estimate on as a single dot, and then blur nearby dots into eachother, then Boom! Estimates are grouped in a fast and efficient manner.

I have also altered the shape of the Gaussian function to more of an oval shape to accommodate eye shapes. This could be good, this could be bad. I have not tested it yet. The following is using an oval shape. I have also noticed that the picture gets darker with more spread out Gaussian functions…. I think I need to moderate the height so as to have a constant area under the function. Integration awaits me!

Original image:

And now after processing, the left is using the original “Joelet” method, the right is using Gaussian blur:

A little different, but I think the average number of eyes detected would be the same.

Great! This would make detecting bigger eyes much more efficient. Although I am still relying on the user inputting an approximate eye size, when I find a fairly optimum Gaussian function shape I think I could probably do away with that.

Just thought I would share a little failure with regard to Alex’s method. I just tried to implement it using FFT to solve some convolutions, which is a LOT faster – even without me trying to streamline it. It does, however, result in every pixel having a negative colour. Which is bad. Very bad.

The worst part is that I’ve seen *I MY CHAIR JUST DISINTEGRATED UNDERNEATH ME!!* Alex’s method implemented in matlab code, but I such a matlab noob, I have a lot of trouble following it. In the near future I might just sit down and try extremely hard to make it work, using the matlab code and everything.

In other news, I have been trying to do a literature review on circle finding techniques (my first ever lit review!). It seems to me there are a few main techniques.

Hough Transform. This is originally used for detecting straight lines, but has been modified to detect any number of shapes, including circles.

Wavelets. Wavelets are the basis for how jpegs works, and are very powerful tools for looking at a picture in different resolutions. In particular, there have been modifications to “circlets” which can be used to track down circles.

Other, misc. Lots of them similar to mine, in which some estimate is made of the centre of each pixel or group of pixels.

I also read n interesting paper about the relationship between head pose and eye gaze, and in particular, how to tell if someone is surprised by something they looked at, or whether they always intended to look at it. Essentially it showed that if a person looks with their eyes, and then turns their head, they are startled. But if they move their head a bit before their eyes, then they intended to look over there.

Previously (see Success!) I implemented a method of my own design, but noted a limitation: I needed to either input the approximate eye radius, or have massive computing power to deal with very large accidental lengths. Inspired by a paper that my supervisor sent to me, in which they use a Gaussian blur effect to turn lots of estimates of a point into one estimate. In fact, it is not truly “inspired” by the paper, but more taken from the paper. Exactly.

So Gaussian blur (as far as I can make out) works by convolving the Gaussian function (essentially a single hump) with the image. This smooths out the image.

I have attempted to apply this to my method. But first I need a FFT library. The following are my installation woes:

OpenCV (http://opencv.willowgarage.com/wiki/) and FFTW (http://www.fftw.org/) are C++ libraries. OpenCV is for image processing, and FFTW is used to perform Fourier Fast Transforms. They would be great, and really complement my project. However it turns out that its somewhat impossible to install and use them with any windows program. I have literally wasted days of my life trying to get them to work with either Microsoft Visual C++ 2010 or Netbeans. In the end I gave up.

In the end I gave up.

If anyone knows of any good (simple!) tutorials, could they possibly let me know? I think I will have to build OpenCV from the source code, but I really don’t know how I would go about that. FFTW requires me to use some lib.exe program from Visual C++ to add the libraries. But it seems to crash every time I run it.

In the mean time I am using a C program for FFT I found at http://www.tech.dmu.ac.uk/~eg/tensiometer/fft/. It appears to work… However their are some interesting results when I use it to perform a Guassian Blur.

Previously I described an attempted to implement Alex’s method. I tried again, to even worse effect. I thought I would include a failure picture to help everyone else feel good about themselves…

As you can see… not quite what I had in mind. Although looking closely at this, it seems to be outlining white lines quite a lot. I should investigate that….

Nope! switching the black to white makes no noticeable difference. There must be a (quite large) bug somewhere in the program. Perhaps one day I will find it. I can but hope.

Success! Although not in implementing Alex’s circle/eye detection method (see last post). I grew tired and frustrated by silly C++, so I decided to make my own, much simpler method that (hopefully) would not require convolutions and FFT’s to fun at a fair speed. It was inspired by the idea of circular wavelets, or circlets, however it is actually nothing to do with them mathematically.

Very simply each pixel considers its own gradient the gradient of the 4 pixels next to it one at a time. The intersection is found between the line made by the gradient of the current pixel, and the line made by the gradient of each adjacent pixel. So we have 4 intersection points. If the intersection points lie on the picture, then we draw a little pyramid of darkness (lets call them Joelets, because its cool!). These pyramids all add up to form some interesting blurry shapes.

How does this detect circles? Well if we are on the edge of a circle, then each gradient will tend to point towards the centre of a circle, so the intersection points will be around the centre. If we are not on the edge of a circle, then the intersection points will just be scattered randomly. Hopefully the little pyramids should all add up near the eyes and cause a massive dark bump. Boom. Eyes detected.

One more issue to consider. How big should the Joelets be? And by big I mean radius wise, as heights are all relative. If they are too big, then everything will just blur together. But if we make them too small, then all we will have is a load of scattered intersection points. I propose that it should be approximately the radius of the eye. I haven’t done any maths to check this, I am only guessing, but it seems to work! That is all well and good, but it relies on us know the eye radius. what if we don’t?? Well we could take the radius of the Joelet to be the distance from the parent pixel to the intersection point. That would be approximately the radius of the circle. It would be brilliant if it was EXTREMELY slow, as some intersection points are on the other side of the image. For now I have just guessed the eye radius, but a task for the future is finding a way to use some kind of variable eye radius. Maybe with a convolution?