David Forsyth, Jean Ponce
The accessible presentation of this book gives both a general view of the entire computer vision enterprise and also offers sufficient detail to be able to build useful applications. Users learn techniques that have proven to be useful by first-hand experience and a wide range of mathematical methods. Comprehensive and up-to-date, this book includes essential topics that either reflect practical significance or are of theoretical importance. Topics are discussed in substantial and increasing depth. Application surveys describe numerous important application areas such as image based rendering and digital libraries. Many important algorithms broken down and illustrated in pseudo code. Appropriate for use by engineers as a comprehensive reference to the computer vision enterprise.
I'm an undergrad who finds computer vision to be fascinating. Where should somebody brand new to computer vision begin?
As with all other things at school.... start by taking up a course with a good amount of project work. Explore ideas and implement algorithms in those projects that you find interesting. Wikipedia is a good beginners resource as usual. If you want books, the most popular ones are:
But I would suggest before you jump in to books, take a course/go through some course slides at one of the top ten universities or via iTunesU.
I am really interested in image processing. I downloaded OpenCV and started playing with it. But I think I lack the knowledge behind image processing. I would like to learn the basic fundamentals of image processing.
I searched for open course from MIT or other universities but didn't seem to find any good tutorial. I did find some slides, but they seem useless without the actually presentation. I searched for online tutorial but mostly they are not for beginners.
Is there a good online tutorial for image processing for beginners?
I need to automatically align an image B on top of another image A in such a way, that the contents of the image match as good as possible.
The images can be shifted in x/y directions and rotated up to 5 degrees on z, but they won't be distorted (i.e. scaled or keystoned).
Maybe someone can recommend some good links or books on this topic, or share some thoughts how such an alignment of images could be done.
If there wasn't the rotation problem, then I could simply try to compare rows of pixels with a brute-force method until I find a match, and then I know the offset and can align the image.
Do I need AI for this?
I'm having a hard time finding resources on image processing which go into detail how these alignment-algorithms work.
So what people often do in this case is first find points in the images that match then compute the best transformation matrix with least squares. The point matching is not particularly simple and often times you just use human input for this task, you have to do it all the time for calibrating cameras. Anyway, if you want to fully automate this process you can use feature extraction techniques to find matching points, there are volumes of research papers written on this topic and any standard computer vision text will have a chapter on this. Once you have N matching points, solving for the least squares transformation matrix is pretty straightforward and, again, can be found in any computer vision text, so I'll assume you got that covered.
If you don't want to find point correspondences you could directly optimize the rotation and translation using steepest descent, trouble is this is non-convex so there are no guarantees you will find the correct transformation. You could do random restarts or simulated annealing or any other global optimization tricks on top of this, that would most likely work. I can't find any references to this problem, but it's basically a digital image stabilization algorithm I had to implement it when I took computer vision but that was many years ago, here are the relevant slides though, look at "stabilization revisited". Yes, I know those slides are terrible, I didn't make them :) However, the method for determining the gradient is quite an elegant one, since finite difference is clearly intractable.
Edit: I finally found the paper that went over how to do this here, it's a really great paper and it explains the Lucas-Kanade algorithm very nicely. Also, this site has a whole lot of material and source code on image alignment that will probably be useful.
Hi i have been working on this for a while and yet to have no good solution.
I am reading a video frame by frame and am using background subtraction to ' identify the region where is there movement and use cvFindContours() to get the rectangle boundary of the moving objects.
Assuming the program is kept simple there can be only 2 human.
These objects and move in a manner they can overlapped, make turn and move away at certain interval.
How can i label this humans x 2 correctly.
cvFindContour can return the boundary in a random manner. for Frame1,Frame2,Frame3....FrameN
I can initially compare rect boundary centroid to label the human correctly. Once the human overlapped and move away this approach will fail.
I tried to keep track of pixel color of the original obj (however the human are fairly similar and certain areas have similar colors like hand,leg,hair ) hence not good enough.
I was considering using Image Statistic like :
CountNonZero(), SumPixels() Mean() Mean_StdDev () MinMaxLoc () Norm ()
to uniquely distinguish the two objects. I believe that would be a better approach.
This is a difficult problem and any solution will not be perfect. Computer vision is jokingly known as an "AI-complete" discipline: if you solve computer vision and you have solved all of artificial intelligence.
Background subtraction can be a good way of detecting objects. If you need to improve the background subtraction results, you might consider using an MRF. Presumably, you can tell when there is a single object and when the two blobs have merged, based on the size of the blob. If the trajectories don't change quickly during the times the blobs are merged, you can do Kalman tracking and use some heuristics to disambiguate the blobs afterwards.
Even though the colors are similar between the two objects, you might consider trying to use a mean shift tracker. It's possible that you may need to do some particle filtering to keep track of multiple hypotheses about who is who.
There are also some even more complicated techniques called layered tracking. There is some more recent work by Jojic and Frey, by Winn, by Zhou and Tao, and by others. Most of these techniques come with very strong assumptions and/or take a lot of work to implement correctly.
If you're interested in this topic in general, I highly recommend taking a computer vision course and/or reading a textbook such as Ponce and Forsyth's.
I am looking for a recommendation for an introduction to image processing algorithms (face and shape recognition, etc.) and wondered if anyone had an good recommendations, either for books, whitepapers or websites.
I am starting from knowing very little about image recognition and did some maths at University (a long time ago).
Any help or pointers would be greatly appreciated.
I found this blog very helpful.
There are quite a lot of topics related to CV that you might want to read up on.
Some of the topics:
The two books that are pretty good on this subject are:
I used the CV: A modern approach for a CV class I took a semester or two ago. It is fairly concise and includes explanations of how the techniques work. Its not for the faint of heart. Also: Forsyth is a well known author of many CV Papers.