2

I have an algorithm I'm trying to implement and I'm struggling to find a good way of doing it. The goal is to take a list of floats, and make a one-to-one mapping to a list of integers such that no two floats gets mapped to the same integer, and the mapping has the smallest possible error (either in terms of total error or mean squared error).

So, for instance, lets say I have the numbers [2.1, 2.3, 2.4, 7, 7.5, 8.9, 9.3]. I'd like it to return something like:

{
    2.1: 1,
    2.3: 2, 
    2.4: 3,
    7: 7,
    7.5: 8,
    8.9: 9,
    9.3: 10
}

Notice that the numbers clustered around 2 have to get spread out to 1, 2, and 3.

It might help to mention that my motivation here is a practical musical one: to map a list of microtonal pitches (notes "in between the cracks") to the keys of the piano. So a "good enough" solution to the problem is really all I need, though a truly optimal solution would be more exciting!

Also, I'm working in Python, but of course the real question here is not language-specific.

MarcTheSpark
  • 473
  • 5
  • 14
  • If the float list is limited, I would consider doing this manually. For the decision to be musical, I think someone should probably listen to and curate a possible solution. Unless, of course, the musical concept is tied to automation. – גלעד ברקן May 16 '20 at 13:29

2 Answers2

2

Maybe not the most elegant and the most efficient code, but:

  • it's of polynomial complexity
  • it provides a global optimum

Basic idea:

  • calculate some worst-case range of candidates (we must not forget any potentially improving candidate)
    • i did not put much though into it -> a "hack"
  • calculate the distance-matrix
  • solve the rectangular linear-assignment problem

Code:

import math
import numpy as np
from scipy.optimize import linear_sum_assignment
from scipy.spatial.distance import cdist

SQUARED_PENALTY = True

data = np.array([2.1, 2.3, 2.4, 7, 7.5, 8.9, 9.3])

# hacky safety-net -> which candidates to look at
min_ = math.floor(data.min())
max_ = math.ceil(data.max())
gap = max_ - min_

cands = np.arange(min_ - gap, max_ + gap)

cost_matrix = cdist(data[:, np.newaxis], cands[:, np.newaxis])

if SQUARED_PENALTY:
  cost_matrix = np.square(cost_matrix)

row_ind, col_ind = linear_sum_assignment(cost_matrix)

solution = cands[col_ind]

print(solution)
print('cost: ', np.round(cost_matrix[row_ind, col_ind].sum(), 3))

Output: l1-costs

[ 2  1  3  7  8  9 10]
cost:  3.3

Output: squared-costs

[ 1  2  3  7  8  9 10]
cost:  2.41
sascha
  • 32,238
  • 6
  • 68
  • 110
  • Thank you! This works beautifully. Even with a substantial array of 50 input pitches it only takes 0.3 seconds to calculate, and that's probably as many as I would ever need for my use case. Quick question: would you mind if I included a slightly adapted version of this code in the extensions package for my python music framework? (http://scamp.marcevanstein.com/) I would credit you of course. – MarcTheSpark May 17 '20 at 16:56
  • Sure. Consider it public domain. No need for any credit, but thanks for asking. – sascha May 17 '20 at 17:01
0

You could try something like this. The idea is to perform all the permutations of the list of integers and by brute force compute a distance metric against your vector of real values.

In your example we are only interested on the first 7 values of each permutation because your list of floats has 7 values. The list of integers that I chose has 10 values so in this case you will realize that the first 7 values will repeat for 6 consecutive samples so you can reduce the number of iterations by sampling every 6 samples. For this kind of brute force solutions it is important to decrease the number of iterations by eliminating redundant information, minimizing computations (fast metrics), using a clever way of testing and so on. For showing here I just included a threshold for the metric so I get one result. But this can be largely improved.

The problem will get more challenging if you have larger lists and also if you allow "semi integers" like 1.5 2.5. I am saying this because the optimal result of the code below is the one you chose to start with :) Also important for any two lists and metric it will be no surprise if you get more than one equally optimal solution. As an example the L1 Norm for [1,2,3,7,8,9,10] is the same as for [2,1,3,7,8,9,10]. What are the implications of this depends on your problem.

import itertools
import numpy as np

y=[2.1,2.3,2.4,7,7.5,8.9,9.3]
y_hat=[1,2,3,4,5,6,7,8,9,10]

py_hat=itertools.permutations(y_hat)
py_hat=list(py_hat)

list_p=[]
for p in py_hat[::6]:
    p=p[0:7]
    f= np.sum((np.array(y)-np.array(p))**2)
    if f < 2.5:
        list_p.append([p,f])

print(list_p)

[[(1, 2, 3, 7, 8, 9, 10), 2.41]]
CSQL
  • 106
  • 1
  • 3