How do I determine a sufficient sample size to test an algorithm that can not be unit tested (~pattern recognition).
I have a relatively simple algorithm that uses vehicle position data, and bridge positional data, to determine whether a vehicle has crossed a bridge or not (true/false). The algorithm is allowed to give false positives but must never give a false negative.
I have tested the algorithm manually 400 times (200 instances where it is known the vehicle crossed, and 200 instances where it is known the vehicle did not cross). It has performed very well with no false negative results.
My concern is that I can not feasibly test the many thousand bridges for every concievable gps approach, and I must rely on a certain sample of tested bridges to be confident in my algorithm. I have read the wikipedia page on sample size and I do not see how it applies to my situation.