Bayesian inference primer: direct sampling

The most straightforward approach is to sample the integrand at different values of θ, multiply each sample return by the volume it represents,  and add up the results. We can perform this sampling on a grid, or randomly (practitioners like to impress themselves by calling it “Monte Carlo” sampling… but it just means random), within some bounded box with the same number of dimensions as there are values in the θ parameter vector. Since we can use a gradient ascent to find a local optimum (and for convex problems, which satisfy certain constraints like the linear regression, we can guarantee the local optimum is a global one), we could theoretically just find which value of θ yields this optimum in the integrand, and sample around it. How hard could that be?

Our intuition in the above paragraph applies very well to one-dimensional and even two- or three-dimensional integrals, but if the parameter vector has more than three numbers in it, an ugly fact emerges: a grid or constant-density random sampling of the volume is multiplied by the length of the search space for each dimension that we add. The volume of a hyper-cube grows as volume = width ^ number_of_dimensions.

Now here’s the fun part. We may have some idea of what the width should be that we sample for each dimension, and as a result, the corners of the hyper-rectangular-prism are unlikely to contain much of the density of the integral. And it turns out that the volume of a hyper-sphere, given a constant radius, actually goes down as the number of dimensions exceeds 7! (This is weird but true, see https://en.wikipedia.org/wiki/Volume_of_an_n-ball). However, how do we sample within the hyper-sphere?

It turns out that it is non-trivial to sample uniformly within the hyper-sphere. The most obvious solution is to sample within the hyper-cube and throw out the points that don’t belong, but as the number of dimensions grows the rejection rate grows closer and closer to 100%. However, there are other approaches you can use, and to learn more about the most common work-arounds, see http://extremelearning.com.au/how-to-generate-uniformly-random-points-on-n-spheres-and-n-balls/

However, there are additional points to consider:

  • We usually don’t have a good way to guess what the width parameter should be for each dimension in θ.

  • We expect the bulk of the density of the integrand to occur close to the local optimum. This suggests we should sample with an exponential density, with many more points close to the center, and fewer farther away.

  • Unfortunately, it turns out that no matter how well-thought you attempt to make your sampling density, any mis-match between it and the actual density of the integrand — which is impossible to know a priori — means that we are wasting samples, and the amount of waste increases with the number of dimensions and the crudeness of your guess.

  • After a while, you discover that you are encoding more-and-more domain-specific knowledge into your sampling density, making your solution less-and-less generalizable.

In other words, this isn’t an exact science, and while many many methods have been developed to hopefully conquer the problem and get a direct estimate have been devised, it will be up to you to try some numerical experiments and build your own intuition. But as someone who has been there before, I can tell you what you’ll find:

No matter how clever you are, direct sampling is too unstable and unreliable to work for general problems. You need to find another way, and that brings us to the next sections.

Code sample for direct sampling

from numpy import random, linalg
import numpy as np

def random_ball(num_points, dimension, radius=1):
'''
 Generate "num_points" random points in "dimension" that have uniform
 probability over the unit ball scaled by "radius" (length of points
 are in range [0, "radius"]).
 From https://stackoverflow.com/questions/54544971/how-to-generate-uniform-random-points-inside-d-dimension-ball-sphere
'''

    # Generate random directions by normalizing the length of a
    # vector of random-normal values.
    random_directions = random.normal(size=(dimension, num_points))
    random_directions /= linalg.norm(random_directions, axis=0)
    
    # Then generate a random radius with probability proportional to
    # the surface area of a ball with a given radius.
    random_radii = random.random(num_points) ** (1 / dimension)
    return radius * (random_directions * random_radii).T


samples = []
sample_propensities = []
for _ in range(n_samples):
    if which_distro == 'normal':
        propensity_model = sp.stats.multivariate_normal(cov=cov, allow_singular=True)
        sample = propensity_model.rvs()
        sample_propensity = np.prod(propensity_model.pdf(sample))
    elif which_distro == 'constant':
        sample = list(random_ball(1, len(initialization_parameters), radius))[0]
        vol = np.pi ** (radius / 2) / sp.special.gamma(radius / 2 + 1) * np.prod(
        sample_propensity = 1 / vol  # only true up to a constant
    samples.append(sample)
    sample_propensities.append(sample_propensity)
sample_weights = [likelihood_function(x) / y for x, y in zip(samples, sample_propensities)]
covariance_matrix = np.cov(samples, aweights=sample_weights)
parameters_at_maximum_likelihood = np.average(samples, weights=sample_weights)

Outline of the Bayesian inference primer

Douglas MasonComment