Let's say we have two signal spaces S1 and S2, each containing hundreds, perhaps thousands of signals. S1 are all signals that are send or received by a given system (plane, car etc.), S2 are all signals that are send or received by software modules of a subsystem inside the system. Each signal has a specific set of dozens of properties like signal name, cycle time, voltage etc.
Now I want to check if each signal in S1 has at least one representation in S2, meaning that all properties of a signal in S1 are equal to all properties of a signal in S2. This sounded easy at first, as one could iterate through the signals and their properties and check if there is an equivalent signal somewhere. But it turned out that on both sides (S1 and S2 signals) there can be wrong specifications, so a signal pair that would belong together can't be identified as such.
Example:
K1 = {Name:= CAN_1234_UHV; Voltage:= 0.8 mV; Cycle=100ms}
D1 = {Name:= CAN_1234_UH; Voltage:= 0.8mV; Cycle=100 ms}
A human beeing can see quite easily that these two signals may very well fit together although there are some spelling mistakes.
So what I did is devising an algorithm that calculates a distance metric of the strings of each property, mapping the similarity to a certain propability that this specific property is equal to the same property of the other signal, calculate the average and categorize the signal as equal if this propbability reaches a certain threshold.
This yielded terrible results because two signals could be classified as equal because certain properties had values that were very common in the signal space. So the next step would be to weight these properties (signalname is better suited than cycle time to identify the signal).
This whole process seems quite arbitrary to me because I don't really know the probabilities and weights that would yield a good result. So I have a feeling that this could be tackled by a machine learning algorithm because it could derive the probabilities and weights from training data.
So, in conclusion, would it be feasible to use a machine learning algorithm to identify signals as "similar enough" so that they can be classified as equal. I'm aware that this question can't be answered generally, I'm more interested in "gut feelings" and "nudges in the right direction".
Thanks in advance