2

The following extract is of a 500 row table that I'm trying to build a numpy lookup function for. My problem is that the values are non-linear.

The user enters a density, volume, and content. so the function will be:

def capacity_lookup(density, volume, content:

For example a typical user entry would be capacity_lookup (47, 775, 41.3). The function should interpolate between the values of 45 and 50 and densities 700 and 800, and contents 40 and 45.

The table extract is:

Volume  Density        Content 
                <30 35  40  45  50>=
45.0    <=100   0.1 1.8 0.9 2.0 0.3
45.0    200     1.5 1.6 1.4 2.4 3.0
45.0    400     0.4 2.1 0.9 1.8 2.5
45.0    600     1.3 0.8 0.2 1.7 1.9
45.0    800     0.6 0.9 0.8 0.4 0.2
45.0    1000    0.3 0.8 0.5 0.3 1.0
45.0    1200    0.6 0.0 0.6 0.2 0.2
45.0    1400    0.6 0.4 0.3 0.7 0.1
45.0    >=1600  0.3 0.0 0.6 0.1 0.3
50.0    <=100   0.1 0.0 0.5 0.9 0.2
50.0    200     1.3 0.4 0.8 0.2 2.7
50.0    400     0.4 0.1 0.7 1.3 1.7
50.0    600     0.8 0.7 0.1 1.2 1.6
50.0    800     0.5 0.3 0.4 0.2 0.0
50.0    1000    0.2 0.4 0.4 0.2 0.3
50.0    1200    0.4 0.0 0.0 0.2 0.0
50.0    1400    0.0 0.3 0.1 0.5 0.1
50.0    >=1600  0.1 0.0 0.0 0.0 0.2
55.0    <=100   0.0 0.0 0.4 0.6 0.1
55.0    200     0.8 0.3 0.7 0.1 1.2
55.0    400     0.3 0.1 0.3 1.1 0.7
55.0    600     0.4 0.3 0.0 0.6 0.1
55.0    800     0.0 0.0 0.0 0.2 0.0
55.0    1000    0.2 0.1 0.2 0.1 0.3
55.0    1200    0.1 0.0 0.0 0.1 0.0
55.0    1400    0.0 0.2 0.0 0.2 0.1
55.0    >=1600  0.0 0.0 0.0 0.0 0.1

Question

How can I store the 500 row table so I can do interpolation on its non linear data and get the correct value based on user input?

Clarifications

  1. If the user inputs the following vector (775, 47, 41.3), the program should return an interpolated value between the following four vectors: 45.0, 600, 0.2, 1.7, 45.0, 800, 0.8, 0.4, 50.0, 600, 0.1, 1.2, and 50.0, 800, 0.4, 0.2
  2. Assume data will be pulled from a DB as a numpy array of your design
Community
  • 1
  • 1
dassouki
  • 6,286
  • 7
  • 51
  • 81
  • You say that "The function should interpolate between the values of 45 and 50 and densities 700 and 800, and contents 40 and 45." but your actual shown data ranges are (45,55),(100,1600),(30,50). And presumably your real Volume range is larger. I don't understand why you only need to interpolate for a small part of your data range. – roippi Aug 11 '13 at 12:04
  • @roippi I'm not sure I understand your question; however, by user input, I probably will have 5 or 10k vectors [volume, density, content] that are within the Volume 25 to 70 range to interpolate and get proper values for. – dassouki Aug 11 '13 at 12:16
  • Oh I see, the numbers you were specifying were specific to the one vector you provided, not the whole range of the data. – roippi Aug 11 '13 at 12:19
  • 2
    I think you have it slightly wrong... Your input space is three-dimensional, so that's the natural shape for your LUT, and you will have to interpolate between the 8 vertices a sub-cube. There is no natural way of interpolating within your cube, although [trilinear interpolation](http://en.wikipedia.org/wiki/Trilinear_interpolation) is probably the easiest to wrap your head around. I may find time to roll this into a full answer later today, but the above should get you on your way. – Jaime Aug 11 '13 at 14:32
  • @Jaime Hi Jaime, I was wondering if you have had the chance to think about this problem? – dassouki Aug 14 '13 at 18:19
  • @Jaime maybe you can help with the tri-linear interpolation... please, check the algorithm below to find the 8 vertices... – Saullo G. P. Castro Aug 19 '13 at 16:16

2 Answers2

2

The first difficulty I found were the <= and >=, which I could handle duplicating the extremities for Density, and changing their values for very close dummy values 99 and 1601, which will not affect the interpolation.

Volume  Density        Content 
                <30 35  40  45  50>=
45.0     99   0.1 1.8 0.9 2.0 0.3
45.0    100   0.1 1.8 0.9 2.0 0.3
45.0    200     1.5 1.6 1.4 2.4 3.0
45.0    400     0.4 2.1 0.9 1.8 2.5
45.0    600     1.3 0.8 0.2 1.7 1.9
45.0    800     0.6 0.9 0.8 0.4 0.2
45.0    1000    0.3 0.8 0.5 0.3 1.0
45.0    1200    0.6 0.0 0.6 0.2 0.2
45.0    1400    0.6 0.4 0.3 0.7 0.1
45.0    1600  0.3 0.0 0.6 0.1 0.3
45.0    1601  0.3 0.0 0.6 0.1 0.3
50.0     99   0.1 0.0 0.5 0.9 0.2
50.0    100   0.1 0.0 0.5 0.9 0.2
50.0    200     1.3 0.4 0.8 0.2 2.7
50.0    400     0.4 0.1 0.7 1.3 1.7
50.0    600     0.8 0.7 0.1 1.2 1.6
50.0    800     0.5 0.3 0.4 0.2 0.0
50.0    1000    0.2 0.4 0.4 0.2 0.3
50.0    1200    0.4 0.0 0.0 0.2 0.0
50.0    1400    0.0 0.3 0.1 0.5 0.1
50.0    1600  0.1 0.0 0.0 0.0 0.2
50.0    1601  0.1 0.0 0.0 0.0 0.2
55.0     99   0.0 0.0 0.4 0.6 0.1
55.0    100   0.0 0.0 0.4 0.6 0.1
55.0    200     0.8 0.3 0.7 0.1 1.2
55.0    400     0.3 0.1 0.3 1.1 0.7
55.0    600     0.4 0.3 0.0 0.6 0.1
55.0    800     0.0 0.0 0.0 0.2 0.0
55.0    1000    0.2 0.1 0.2 0.1 0.3
55.0    1200    0.1 0.0 0.0 0.1 0.0
55.0    1400    0.0 0.2 0.0 0.2 0.1
55.0    1600  0.0 0.0 0.0 0.0 0.1
55.0    1601  0.0 0.0 0.0 0.0 0.1

Then, as @Jaime already pointed out, you have to find 8 vertices in order to do the tri-linear interpolation.

The following algorithm will give you the points:

import numpy as np
def get_8_points(filename, vi, di, ci):
    a = np.loadtxt(filename, skiprows=2)
    vol = a[:,0].repeat(a.shape[1]-2).reshape(-1,)
    den = a[:,1].repeat(a.shape[1]-2).reshape(-1,)
    #FIXME maybe you have to change the next line
    con = np.tile(np.array([30., 35., 40., 45., 50.]),a.shape[0]).reshape(-1,)
    #
    val = a[:,2:].reshape(a.shape[0]*5).reshape(-1,)

    u = np.unique(vol)
    diff = np.absolute(u-vi)
    vols = u[diff.argsort()][:2]

    u = np.unique(den)
    diff = np.absolute(u-di)
    dens = u[diff.argsort()][:2]

    u = np.unique(con)
    diff = np.absolute(u-ci)
    cons = u[diff.argsort()][:2]

    check = np.in1d(vol,vols) & np.in1d(den,dens) & np.in1d(con,cons)

    points = np.vstack((vol[check], den[check], con[check], val[check]))

    return points.T

Using your example:

vi, di, ci = 47, 775, 41.3
points = get_8_points(filename, vi, di, ci)
#array([[  4.50e+01,   6.00e+02,   4.00e+01,   2.00e-01],
#       [  4.50e+01,   6.00e+02,   4.50e+01,   1.70e+00],
#       [  4.50e+01,   8.00e+02,   4.00e+01,   8.00e-01],
#       [  4.50e+01,   8.00e+02,   4.50e+01,   4.00e-01],
#       [  5.00e+01,   6.00e+02,   4.00e+01,   1.00e-01],
#       [  5.00e+01,   6.00e+02,   4.50e+01,   1.20e+00],
#       [  5.00e+01,   8.00e+02,   4.00e+01,   4.00e-01],
#       [  5.00e+01,   8.00e+02,   4.50e+01,   2.00e-01]])

Now you can perform the tri-linear interpolation...

Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
  • 1
    Added a quick trilinear interpolation implementation. The challenge would be to join your code and mine into a vectorized interpolation routine, but that is [left as exercise](http://abstrusegoose.com/12). – Jaime Aug 19 '13 at 17:34
2

To complement Saullo's answer, here's how to do trilinear interpolation. You basically interpolate the cube into a square, then the square into a segment, and the segment into a point. Order of the interpolations does not alter the final result. Saullo's numbering scheme is already the right one: the base vertex is number 0, increasing the last dimension adds 1 to the vertex number, the second-to-last adds 2, and the first dimension adds 4. So from his vertex returning function, you could do the following:

coords = np.array([47, 775, 41.3])
ndim = len(coords)
# You would get this with a call to:
# vertices = get_8_points(filename, *coords)
vertices = np.array([[  4.50e+01,   6.00e+02,   4.00e+01,   2.00e-01],
                     [  4.50e+01,   6.00e+02,   4.50e+01,   1.70e+00],
                     [  4.50e+01,   8.00e+02,   4.00e+01,   8.00e-01],
                     [  4.50e+01,   8.00e+02,   4.50e+01,   4.00e-01],
                     [  5.00e+01,   6.00e+02,   4.00e+01,   1.00e-01],
                     [  5.00e+01,   6.00e+02,   4.50e+01,   1.20e+00],
                     [  5.00e+01,   8.00e+02,   4.00e+01,   4.00e-01],
                     [  5.00e+01,   8.00e+02,   4.50e+01,   2.00e-01]])

for dim in xrange(ndim):
   vtx_delta = 2**(ndim - dim - 1)
    for vtx in xrange(vtx_delta):
        vertices[vtx, -1] += ((vertices[vtx + vtx_delta, -1] -
                               vertices[vtx, -1]) *
                              (coords[dim] -
                               vertices[vtx, dim]) /
                              (vertices[vtx + vtx_delta, dim] -
                               vertices[vtx, dim]))

print vertices[0, -1] # prints 0.55075

The function reuses the vertices array for the intermediate interpolations leading to the final value, stored in vertices[0, -1], you would have to do a copy of the vertices array if you will need it afterwards.

Jaime
  • 65,696
  • 17
  • 124
  • 159