Given a posterior p(Θ|D) over some parameters Θ, one can define the following:
The Highest Posterior Density Region is the set of most probable values of Θ that, in total, constitute 100(1-α) % of the posterior mass.
In other words, for a given α, we look for a *p** that satisfies:
and then obtain the Highest Posterior Density Region as the set:
Using the same notation as above, a Credible Region (or interval) is defined as:
Depending on the distribution, there could be many such intervals. The central credible interval is defined as a credible interval where there is (1-α)/2 mass on each tail.
From my understanding "central credible region" is not any different from how confidence intervals are calculated; all you need is the inverse of
cdf function at
scipy this is called
ppf ( percentage point function ); so as for Gaussian posterior distribution:
>>> from scipy.stats import norm >>> alpha = .05 >>> l, u = norm.ppf(alpha / 2), norm.ppf(1 - alpha / 2)
to verify that
[l, u] covers
(1-alpha) of posterior density:
>>> norm.cdf(u) - norm.cdf(l) 0.94999999999999996
similarly for Beta posterior with say
>>> from scipy.stats import beta >>> l, u = beta.ppf(alpha / 2, a=1, b=3), beta.ppf(1 - alpha / 2, a=1, b=3)
>>> beta.cdf(u, a=1, b=3) - beta.cdf(l, a=1, b=3) 0.94999999999999996
here you can see parametric distributions that are included in scipy; and I guess all of them have
As for highest posterior density region, it is more tricky, since
a = b = .5 ( as can be seen here);
But, in the case of Gaussian distribution, it is easy to see that "Highest Posterior Density Region" coincides with "Central Credible Region"; and I think that is is the case for all symmetric uni-modal distributions ( i.e. if pdf function is symmetric around the mode of distribution)
A possible numerical approach for the general case would be binary search over the value of
p* using numerical integration of
Here is an example for mixture Gaussian:
[ 1 ] First thing you need is an analytical pdf function; for mixture Gaussian that is easy:
def mix_norm_pdf(x, loc, scale, weight): from scipy.stats import norm return np.dot(weight, norm.pdf(x, loc, scale))
so for example for location, scale and weight values as in
loc = np.array([-1, 3]) # mean values scale = np.array([.5, .8]) # standard deviations weight = np.array([.4, .6]) # mixture probabilities
you will get two nice Gaussian distributions holding hands:
[ 2 ] now, you need an error function which given a test value for
p* integrates pdf function above
p* and returns squared error from the desired value
1 - alpha:
def errfn( p, alpha, *args): from scipy import integrate def fn( x ): pdf = mix_norm_pdf(x, *args) return pdf if pdf > p else 0 # ideally integration limits should not # be hard coded but inferred lb, ub = -3, 6 prob = integrate.quad(fn, lb, ub) return (prob + alpha - 1.0)**2
[ 3 ] now, for a given value of
alpha we can minimize the error function to obtain
alpha = .05 from scipy.optimize import fmin p = fmin(errfn, x0=0, args=(alpha, loc, scale, weight))
which results in
p* = 0.0450, and HPD as below; the red area represents
1 - alpha of the distribution, and the horizontal dashed line is
I am new at the domain of machine learning and i have noticed that there are a lot of algorithms/ set of algorithms that can be used: SVM, decision trees, naive bayes, perceptron etc... That is why I wonder which algorithm should one use for solving which issue? In other words which algorithm solves which problem class?
So my question is if you know a good web site or book that focuses on this algorithm selection problematic?
Any help would be appreciated. Thx in advance.
It is very hard answer the question “which algorithm for which issue?”
That ability comes with a lot of experience and knowledge. So I suggest, you should read few good books about machine learning. Probably, following book would be a good starting point.
Once you have some knowledge about machine learning, you can work on couple of simple machine learning problems. Iris flower dataset is a good starting point. It consists of several features belonging to three types of Iris species. Initially develop a simple machine learning model (such as Logistic Regression) to classify Iris species and gradually you could move to more advanced models such as Neural Networks.
So in Matlab I perform PCA on handwritten digits. Essentially, I have say 30*30 dimensional pictures, i.e. 900 pixels, and I consider after PCA the components which capture most of the variance, say the first 80 principal components(PC) based on some threshold. Now these 80 PCs are also of 900 dimension, and when i plot these using imshow, I get some images, like something looking like a 0, 6, 3, 5 etc. What is the interpretation of these first few of the PCs (out of the 80 I extracted)?
PCA extracts the most important information from the data set and compresses the size of the data set by keeping only the important information - principal components.
The first principal component is constructed in such a way that it has the largest possible variance. The second component is computed under the constraint of being orthogonal to the first component and to have the largest possible variance.
In your case the data is a set of images. Let's say you have 1000 images and you compute the first five principal components (5 images, constructed by the PCA algorithm). You may represent any image as 900 data points (30x30 pixels) or by the combination of 5 images with the corresponding miltiplication coefficients.
The goal of the PCA algorithm is to construct these 5 images (principal componenets) in such a way that images in your data set are represented most accurately with the combination of the given number of principal components.
Consider the image below (from the amazing book by Kevin Murphy). The image shows how points in 2 dimensions (red dots) are represented in 1 dimension (green crosses) by projecting them to the vector (purple line). The vector is the first principal component. The purpose of PCA is to build these vectors to minimize the reconstruction error. In your case these vectors can be represented as images.
You may refer to this article for more details on using PCA for handwritten digit recognition.