Machine Learning for Hackers

Drew Conway, John White

Mentioned 2

Presents algorithms that enable computers to train themselves to automate tasks, focusing on specific problems such as prediction, optimization, and classification.

More on

Mentioned in questions and answers.

I have tags on my website, and I input them one by one when I create a blog post. I love gmail's new feature, that ask you if you want to include X in a mail, if you type Y's name and that you often include both of them in the same messages.

I'd like to do something similar on my website, but I don't know how to represent the tags "related-ness" in an object or database ... thoughts ?

Look up Clustering (Machine Learning algorithm). Don't be intimidated by math, it's a pretty straightforward algorithm. Check out Machine Learning for Hackers for simpler explanations of many Machine Learning algorithms and methods.

Assume the following:

Category "Electronics" contains product "Bluerays" among other products.

What are some basic statistics I can implement to recommend more "Bluerays" when the user browses under Electronics? Right now I just have a lame "Bluerays were bought 3 out of 5 times under the Electronics category for this user" so the likelihood is 60% - recommend more Bluerays.

EDIT: What if I'm coming from a seller's perspective where I want to auto-fill the input box? Example: If the seller usually sells Bluerays used, I want to auto-fill the "condition" field the next time he sells under "Electronics" to enhance the user experience?

Your question is about recommender systems. You are interesting in finding similarities that can help you to make good recommendations. These similarities can be measured in several different ways. The most common is to consider the past behavior of the people that have bought in your site and to seek similarities among them. This can be done using simple correlation among the vectors of products. If you have also data about the people (age,gender) that usually buy in your site, you can use this kind of information to improve your recommender system. Furthemore, a valuable piece of information is the one provided by rate systems (like and deslikes). Besides correlation (if you want to consider other simple measures, but not necessarily statistical ones) you can also use Euclidean distance, Minkowski distance, the cosine of the angle of the vectors and so on...

If the dimension of your vector is high, you may consider to reduce the dimmension of it including only the important components. This can be done using PCA (Principal Component Analysis) or Singular Value Decomposition.

However, if you consider to really improve your system you should consider using classifiers such as Nearest Neighbors, Decision Trees or support vector machines and using them to discover the class of your buyers. For instance, this can help you to know if a given buyer preffers cheap or expansive brands...

Finally, you can make online experiments using multi-armed bandit.

There are some books that can help you:

1) Recommender Systems

2) Bandit algorithm

3) Machine Learning