Joe's blog about AI things!

Month: July, 2012

Quick Update

I have decided to use the OpenCV template matching function, and just using it in Square Difference mode. It is not amazingly efficient, but its not bad.

It works quite well, I have made the box around the eyes stick in place, on the face, and so now I really just need to make everything more efficient and accurate. Then I can do a bit of maths and work out some estimates for where the person is looking. Hopefully.

When it is a bit better I will post a video, but for now here  are some pictures of the template matching working:

I Track Eye Tracking.

As the title clearly states, I have used the powers of OpenCV to make some eye tracking happen in real time. I then used it to make a VERY rough estimate of the gaze location.

So I open up a stream from the webcam, and then draw around the general eye area. This then remains fixed in space and does not move. The eyes are then located using my method. When the person is looking at the center of the screen, a button is pressed and this initialises the initial eye position. From then I just calculated the approximate eye displacement, and used the proportion displaced to relate to the proportion of the screen that the gaze had travelled. Please note that this is the most inaccurate thing ever, and will be fixed as soon as I have the time and motivation. Ok I lied, its not the very biggest problem…

The fact that the box around the eye area does not move is the BIGGEST problem! This means that if your head moves at all, the computer wont know that it was your head and not your eye. and it is very easy for your head to twitch by the radius of your eye. I will fix this using the power of template matching. Maybe. I still need to experiment. My plan is to take the initial box around the eye area, and use it as a template, and make it stay fixed on the face, rather than the webcam image. If that makes sense.

Once the eye box is fixed on the face, we can use that to tell us how the face moves, and the eye positions to tell us how the eyes move, and then all we need is some tasty maths!

Here is a video of my work so far…

There are 3 different sensitivity levels in this.

The Master Of The UNIVERSE!

Or, at least, OpenCV. That’s right my faithful readers, I have managed to install it. It took me many days of pain and suffering, but It finally works with Microsoft Visual C++ 2010.

In other, less exciting news, I have been half heartedly working on a function to find the local maxima of an image. This would be easy (look for a change in gradient in both the x and y direction), but the image is full of disgusting noise, leading me to search for a more creative solution. Perhaps some pre-processing is needed? Or maybe some kind of… find a maximum and then ignor any nearby maxima. Could try filling in every change of gradient point, and then convolving with Gauss, and then repeating. Who knows!

A New and Exciting Method. The Isophote.

Some few of you may recall one of my previous posts (Success! Joe’s new method.) I outlined and implemented a technique to detect circles. My idea was to consider pairs of adjacent pixels, and calculate where their gradients intersected, thus giving an approximate centre for any dark circles they lay on. Well, I was reading a paper recently, and it outlined a very similar method, but instead of considering two pixels, you considered only one (something that I had assumed was possible, but had no idea how to do).

First the paper introduced the concept of an isophote. This is simply an area of constant darkness/intensity. So if our image is described by a function f(x,y), then we have an isophote at (x,y) where f(x,y) = f0 (f0 is constant). This is essentially describing contours. It then goes on to use the curvature of each pixel, assuming it is on an isophote, to estimate the radius of the circle it is on. Curvature is the second derivative of y with respect to x, and a formula for this can be derived from the definition of an isophote using total derivatives (as f = a constant, we have y(x), so we must use the total derivative, paying attention to the chain rule). The radius of the circle that it is on is given by 1/curvature, which is very convenient. To convince yourself of this, try plugging the values x = 0, y = r into the definition of curvature, and you should get 1/r. Easy. So now we have the distance to the centre. The direction to the centre is given by the gradient at that point. All it takes is a little trigonometry, and we have an estimate!

So here is a pretty picture of it at work:

The slowness of a Joelet.

Remember before when I commented on how slow it would be to draw a large “bump” onto an image at every estimate? Well I did even if you don’t. It seems that Gaussian blur (again discussed in an earlier post) offers a perfect solution. If I draw every estimate on as a single dot, and then blur nearby dots into eachother, then Boom! Estimates are grouped in a fast and efficient manner.

I have also altered the shape of the Gaussian function to more of an oval shape to accommodate eye shapes. This could be good, this could be bad. I have not tested it yet. The following is using an oval shape. I have also noticed that the picture gets darker with more spread out Gaussian functions…. I think I need to moderate the height so as to have a constant area under the function. Integration awaits me!

Original image:

And now after processing, the left is using the original “Joelet” method, the right is using Gaussian blur:

A little different, but I think the average number of eyes detected would be the same.

Great! This would make detecting bigger eyes much more efficient. Although I am still relying on the user inputting an approximate eye size, when I find a fairly optimum Gaussian function shape I think I could probably do away with that.

Alex’s method 3, and some tasty paper.

Just thought I would share a little failure with regard to Alex’s method. I just tried to implement it using FFT to solve some convolutions, which is a LOT faster – even without me trying to streamline it. It does, however, result in every pixel having a negative colour. Which is bad. Very bad.

The worst part is that I’ve seen *I MY CHAIR JUST DISINTEGRATED UNDERNEATH ME!!* Alex’s method implemented in matlab code, but I such a matlab noob, I have a lot of trouble following it. In the near future I might just sit down and try extremely hard to make it work, using the matlab code and everything.

In other news, I have been trying to do a literature review on circle finding techniques (my first ever lit review!). It seems to me there are a few main techniques.

1. Hough Transform. This is originally used for detecting straight lines, but has been modified to detect any number of shapes, including circles.
2. Wavelets. Wavelets are the basis for how jpegs works, and are very powerful tools for looking at a picture in different resolutions. In particular, there have been modifications to “circlets” which can be used to track down circles.
3. Other, misc. Lots of them similar to mine, in which some estimate is made of the centre of each pixel or group of pixels.

I also read n interesting paper about the relationship between head pose and eye gaze, and in particular, how to tell if someone is surprised by something they looked at, or whether they always intended to look at it. Essentially it showed that if a person looks with their eyes, and then turns their head, they are startled. But if they move their head a bit before their eyes, then they intended to look over there.

Fascinating stuff.

Gaussian Blur!

Hi everyone!

Previously (see Success!) I implemented a method of my own design, but noted a limitation: I needed to either input the approximate eye radius, or have massive computing power to deal with very large accidental lengths. Inspired by a paper that my supervisor sent to me, in which they use a Gaussian blur effect to turn lots of estimates of a point into one estimate. In fact, it is not truly  “inspired” by the paper, but more taken from the paper. Exactly.

So Gaussian blur (as far as I can make out) works by convolving the Gaussian function (essentially a single hump) with the image. This smooths out the image.

I have attempted to apply this to my method. But first I need a FFT library. The following are my installation woes:

OpenCV (http://opencv.willowgarage.com/wiki/) and FFTW (http://www.fftw.org/) are C++ libraries. OpenCV is for image processing, and FFTW is used to perform Fourier Fast Transforms. They would be great, and really complement my project. However it turns out that its somewhat impossible to install and use them with any windows program. I have literally wasted days of my life trying to get them to work with either Microsoft Visual C++ 2010 or Netbeans. In the end I gave up.

In the end I gave up.

If anyone knows of any good (simple!) tutorials, could they possibly let me know? I think I will have to build OpenCV from the source code, but I really don’t know how I would go about that. FFTW requires me to use some lib.exe program from Visual C++ to add the libraries. But it seems to crash every time I run it.

In the mean time I am using a C program for FFT I found at http://www.tech.dmu.ac.uk/~eg/tensiometer/fft/. It appears to work… However their are some interesting results when I use it to perform a Guassian Blur.

Revenge of Alex’s Method

Previously I described an attempted to implement Alex’s method. I tried again, to even worse effect. I thought I would include a failure picture to help everyone else feel good about themselves…

As you can see… not quite what I had in mind. Although looking closely at this, it seems to be outlining white lines quite a lot. I should investigate that….

Nope! switching the black to white makes no noticeable difference. There must be a (quite large) bug somewhere in the program. Perhaps one day I will find it. I can but hope.