Mahout in Action

Sean Owen, Robin Anil, Ted Dunning

Mentioned 3

Presents information on machine learning through the use of Apache Mahout, covering such topics as using group data to make individual recommendations, finding logical clusters, and filtering classifications.

More on

Mentioned in questions and answers.

After Installation of mahout from ( to Run mahout algo and from where i can get most popular as easy tutorial for mahout freshers....

THanks in advance.

Mahout in Action is also a good book to start for beginners to expert.

Getting Started with the examples of Mahout in Action

Well at-last I am working on my final year project which is Intelligent web based career guidence system the core functionality of my system is

Recommendation System

Basically our recommendation system will carefully examine user preferences by taking Interest tests and user’s academic record and on the basis of this examined information it will give user the best career options i.e the course like BS Computer Science etc. .

  • Input of the recommendation system will be the student credentials and Interest test and in interest test the questions will be given according to user academic history and the answers that he is giving in the test, so basically test will not be asking same questions from everyone it will decide on real time about what to ask from which user according to rules defined by the system.

  • Its output will be the option of fields which will be decided on the basis of Interest test.


When I was defending my scope infront of committee they said "this is simple if-else" this system is not intelligent.
My question is which AI technique or Algorithm could be use to make this system intelligent. I have searched alot but papers related to my system are much more superficial they are just emphasizing on idea not on methodology.
I want to do all my work in Java. It is great if answer is technology specific.
You people can transfer my question to any other stackexchange site if it is not related to SO Q&A criteria.


After getting some idea from answers i want to implement expert system with rule based and inference engine. Now i want to be more clear on technology aspect to implement rule based engine. After searching i have found Drools to be best but Is it also compatible with web applications? And i also found Tohu to be best dynamic form generator (as this is also need of my project). can i use tohu with drools to make my web application? Is it easy to implement this type of system or not?

A program is never more intelligent than the person who wrote it. So, I would first use the collective intelligence that has been built and open sourced already.

Pass your set of known data points as an input to Apache Mahout's PearsonCorrelationSimilarity and use the output to predict which course is the best match. In addition to being open source and scalable, you can also record the outcome and feed it back to the system to improve the accuracy over time. It is very hard to match this level of performance because it is a lot easier to tweak an out of the box algorithm or replace it with your own than it is to deal with a bunch of if else conditions.

I would suggest reading this book . It contains an example of how to use PearsonCorrelationSimilarity.

Mahout also has built in recommender algorithms like NearestNeighborClusterSimilarity that can simplify your solution further.

There's a good starter code in the book. You can build on it.

Student credentials, Interest Test Questions and answers are inputs. Career choice is the output that you can co-relate to the input. Now that's a very simplistic approach but it might be ok to start with. Eventually, you will have to apply the classifier techniques that Amit has suggested and Mahout can help you with that as well.

I am trying to build a classifier with mahout. After the model is built. I have to "feed" the target documents to the model and get the classification result.

I checked the testcases in the mahout source code, it uses DenseVector which have the fixed amount of fields. However, I m using mahout to classify text docs, the input is some string(or array containing strings). How to convert it to a valid "Vector" instance.

I tried the StaticWordEncoder and RandomAccessSparseVector, but the result is not correct. Cannot figure out why. A little bit desperate.

You have to parse the document into words and populate the vector from those.

I would recommend reading something like Mahout In Action to get more background before attempting this.