Neural Networks for Pattern Recognition

Christopher M. Bishop

Mentioned 5

`Readers will emerge with a rigorous statistical grounding in the theory of how to construct and train neural networks in pattern recognition' New Scientist

More on Amazon.com

Mentioned in questions and answers.

I'm really interested in Artificial Neural Networks, but I'm looking for a place to start.

What resources are out there and what is a good starting project?

If you don't mind spending money, The Handbook of Brain Theory and Neural Networks is very good. It contains 287 articles covering research in many disciplines. It starts with an introduction and theory and then highlights paths through the articles to best cover your interests.

As for a first project, Kohonen maps are interesting for categorization: find hidden relationships in your music collection, build a smart robot, or solve the Netflix prize.

First of all, give up any notions that artificial neural networks have anything to do with the brain but for a passing similarity to networks of biological neurons. Learning biology won't help you effectively apply neural networks; learning linear algebra, calculus, and probability theory will. You should at the very least make yourself familiar with the idea of basic differentiation of functions, the chain rule, partial derivatives (the gradient, the Jacobian and the Hessian), and understanding matrix multiplication and diagonalization.

Really what you are doing when you train a network is optimizing a large, multidimensional function (minimizing your error measure with respect to each of the weights in the network), and so an investigation of techniques for nonlinear numerical optimization may prove instructive. This is a widely studied problem with a large base of literature outside of neural networks, and there are plenty of lecture notes in numerical optimization available on the web. To start, most people use simple gradient descent, but this can be much slower and less effective than more nuanced methods like

Once you've got the basic ideas down you can start to experiment with different "squashing" functions in your hidden layer, adding various kinds of regularization, and various tweaks to make learning go faster. See this paper for a comprehensive list of "best practices".

One of the best books on the subject is Chris Bishop's Neural Networks for Pattern Recognition. It's fairly old by this stage but is still an excellent resource, and you can often find used copies online for about $30. The neural network chapter in his newer book, Pattern Recognition and Machine Learning, is also quite comprehensive. For a particularly good implementation-centric tutorial, see this one on CodeProject.com which implements a clever sort of network called a convolutional network, which constrains connectivity in such a way as to make it very good at learning to classify visual patterns.

Support vector machines and other kernel methods have become quite popular because you can apply them without knowing what the hell you're doing and often get acceptable results. Neural networks, on the other hand, are huge optimization problems which require careful tuning, although they're still preferable for lots of problems, particularly large scale problems in domains like computer vision.

I found Fausett's Fundamentals of Neural Networks a straightforward and easy-to-get-into introductory textbook.

Two books that where used during my study:

Introductional course: An introduction to Neural Computing by Igor Aleksander and Helen Morton.

Advanced course: Neurocomputing by Robert Hecht-Nielsen

Raul Rojas' book is a a very good start (it's also free). Also, Haykin's book 3rd edition, although of large volume, is very well explained.

Neural Networks are kind of declasse these days. Support Vector Machines and kernel methods are better for more classes of problems then back propagation. Neural networks and genetic algorithms capture the imagination of people who don't know much about modern machine learning but they are not state of the art.

If you want to learn more about AI/Machine learning, I recommend buying and reading Peter Norvig's Artificial Intelligence: A Modern Approach. It's a broad survey of AI and lots of modern technology. It goes over the history and older techniques too, and will give you a more complete grounding in the basics of AI/Machine Learning.

Neural networks are pretty easy, though. Especially if you use a genetic algorithm to determine the weights, rather then proper back propagation.

I can recommend where not to start. I bought An Introduction to Neural Networks by Kevin Gurney which has good reviews on Amazon and claims to be a "highly accessible introduction to one of the most important topics in cognitive and computer science". Personally, I would not recommend this book as a start. I can comprehend only about 10% of it, but maybe it's just me (English is not my native language). I'm going to look into other options from this thread.

I wanted to ask Stack Overflow users for a nice idea for a project that could entertain a fellow student programmer during a semester. Computer vision might look interesting, although I couldn't say if a project on that field is something that could be achievable in 4 months. What do you think?

Can't tell without knowing more about you, your friend, and the project. My guess is "no".

I'd point you towards two sources. The first is Peter Norvig's "Artificial Intelligence"; the second is "Programming Collective Intelligence". Maybe they'll inspire you.

There is a story that, during the early days of AI research when significant progress was being made on "hard" logic problems via mechanical theorem provers, a professor assigned one of his graduate students the "easy" problem of solving how vision provided meaningful input to the brain. Obviously, things turned out to be far more difficult than the professor anticipated. So, no, not vision in the general sense.

If you are just starting in AI, there are a couple of directions. The classic AI problems - logic puzzles - are solved with a mechanical theorem prover (usually written in Lisp - see here for the classic text on solving logical puzzles). If you don't want to create your own, you can pick up a copy of Prolog (it is essentially the same thing).

You can also go with pattern recognition problems although you'll want to keep the initial problems pretty simple to avoid getting swamped in detail. My dissertation involved the use of stochastic proccesses for letter recognition in free-floating space so I'm kind of partial to this approach (don't start with stochastic processes though, unless you really like math). Right next door is the subfield of neural networks. This is popular because you almost can't learn NN without building some interesting projects. Across this entire domain (pattern processing), the cool thing is that you can solve real problems rather than toy puzzles.

A lot of people like Natural Language Processing as it is easy to get started but almost infinite in complexity. One very definable problem is to build an NLP program for processing language in a specific domain (discussing a chess game, for example). This makes it easy to see progress while still being complex enough to fill a semester.

Hope that gives you some ideas!

I have a quick question regarding back propagation. I am looking at the following:

http://www4.rgu.ac.uk/files/chapter3%20-%20bp.pdf

In this paper it says calculate the error the neuron error as:

Error = Output(i) * (1 - Output(i)) * (Target(i) - Output(i))

I have put the part of the equation that I don't understand in bold. In the paper, it says that the Output(i) * (1 - Output(i)) term is needed because of the sigmoid function - but I still don't understand why this would be nessecary ? What would be wrong with using ...

Error = abs(Output(i) - Target(i))

... as the error function regardless of the neuron activation/transfer function ?

Many thanks

The choice of the sigmoid function is by no means arbitrary. Basically you are trying to estimate the conditional probability of a class label given some sample. If you take the absolute value, you are doing something different, and you will get different results.

For a practical introduction in the topic I would recommend you to check out the online Machine Learning course by Prof. Andrew Ng

https://www.coursera.org/course/ml

and the book by Prof. Christopher Bishop for an in depth study on the topic

http://www.amazon.com/Neural-Networks-Pattern-Recognition-Christopher/dp/0198538642/ref=sr_1_1?ie=UTF8&qid=1343123246&sr=8-1&keywords=christopher+bishop+neural+networks

I've recently come across Intelligent Agents by reading this book : link text

I'm interested in finding a good book for beginners, so I can start to implement such a system. I've also tried reading "Multiagent Systems : A modern approach to distributed artificial intelligence" (can't find it on amazon) but it's not what I'm looking for. Thanks for the help :).

There is numerous classical books:

The first two are the easiest, the second one covers more than machine learning. However, there is little "pragmatic" or "engineering" stuff in there. And the math is quite demanding, but so is the whole field. I guess you will do best with O'Reilly's programming collective intelligence because it has its focus on programming.

I am new in neural networks and I need to determine the pattern among a given set of inputs and outputs. So how do I decide which neural network to use for training or even which learning method to use? I have little idea about the pattern or relation between the given input and outputs.

Any sort of help will be appreciated. If you want me to read some stuff then it would be great if links are provided.

If any more info is needed plz say so.

Thanks.

Choosing the right neural networks is something of an art form. It's a bit difficult to give generic suggestions as the best NN for a situation will depend on the problem at hand. As with many of these problems neural netowrks may or may not be the best solution. I'd highly recommned trying out different networks and testing their performance vs a testing data set. When I did this I usually used the ANN tools though the R software package.

Also keep your mind open to other statistical learning techniques as well, things like decision trees and Support Vector Machines may be a better choice for some problems.

I'd suggest the following books:

http://www.amazon.com/Neural-Networks-Pattern-Recognition-Christopher/dp/0198538642

http://www.stats.ox.ac.uk/~ripley/PRbook/#Contents