2

I'm trying to generate a 3D distribution, where x, y represents the surface plane, and z is the magnitude of some value, distributed over a range.

I'm looking at numpy's multivariate_normal, but it only lets me get a number of samples. I'd like the ability to specify some x, y coordinate, and get back what the z value should be; so I'd be able to query gp(x, y) and get back a z value that adheres to some mean and covariance.

Perhaps a more illustrative (toy) example: assume I have some temperature distribution that can be modeled as a gaussian process. So I might have a mean temperature of 20 at (0, 0), and some covariance [[1, 0], [0, 1]]. I'd like to be able to create a model that I can then query at different x, y locations to get the temperature at that position (so, at (5, 5) I might get back something like 7 degrees).

How to best accomplish this?

Teknophilia
  • 758
  • 10
  • 23

2 Answers2

1

I assume that your data can be copied to a single np.array, which I will refer to as X in my code, with shape X.shape = (n,2), where n is the number of data points you have and you can have n = 1, if you wish to test a single point at a time. 2, of course, refers to the 2D space spanned by your coordinates (x and y) base. Then:

def estimate_gaussian(X):
  return X.mean(axis=0), np.cov(X.T)

def mva_gaussian( X, mu, sigma2 ):
  k = len(mu)
  # check if sigma2 is a vector and, if yes, use as the diagonal of the covariance matrix
  if sigma2.ndim == 1 :
    sigma2 = np.diag(sigma2)
  X = X - mu
  return (2 * np.pi)**(-k/2) * np.linalg.det(sigma2)**(-0.5) * \
    np.exp( -0.5 * np.sum( np.multiply( X.dot( np.linalg.inv(sigma2) ), X ), axis=1 ) ).reshape( ( X.shape[0], 1 ) )

will do what you want - that is, given data points you will get the value of the gaussian function at those points (or a single point). This is actually a generalized version of what you need, as this function can describe a multivariate gaussian. You seem to be interested in the k = 2 case and a diagonal covariance matrix sigma2.

Moreover, this is also a probability distribution - which you say you don't want. We don't have enough info to know what exactly it is you're trying to fit to (i.e. what you expect the three parameters of the gaussian function to be. Usually, people are interested in a normal distribution). Nevertheless, you can simply change the parameters in the return statement of the mva_gaussian function according to your needs and ignore the estimate gaussian function if you don't want a normalized distribution (although a normalized function would still give you what you seek - a real valued temperature - as long as you know the normalization process - which you do :-) ).

id5h
  • 206
  • 2
  • 9
  • 1
    I edited my answer to better fit your needs. Although it is impossible to know what you are trying to fit to - which is why I wrote a normalized distribution function. But, now you can change the parameters in the function to suit your temperature distribution. – id5h Jan 25 '17 at 21:03
0

You can create a multivariate normal using scipy.stats.multivariate_normal.

>>> import scipy.stats
>>> dist = scipy.stats.multivariate_normal(mean=[2,3], cov=[[1,0],
                                                            [0,1]])

Then to find p(x,y) you can use pdf

>>> dist.pdf([2,3])
0.15915494309189535
>>> dist.pdf([1,1])
0.013064233284684921

Which represents the probability (which you called z) given any [x,y]

Cory Kramer
  • 114,268
  • 16
  • 167
  • 218
  • So, this is probably a lack of understanding on my part: a PDF seems to be similar to what I'm looking for, but rather than the 0-1 probability of x, y, I'd want some Gaussian curve-like function/surface that I can sample to get the actual data points. – Teknophilia Jan 25 '17 at 17:03