2

I apologize if the title doesn't make much sense, but I didn't know how to word it without it being far too long.

A little context: I'm working on a mobile app that is supposed to aid coffee farmers in tracking pest infestation on their farms. A feature we are working on is allowing farmers to dump the contents of their trap into a sieve and compare it to a picture on their phone. The app then asks them if the picture looks like it contains more or less pests in it than that of the one that is in front of them (we have a dataset of thousands of trap-catch pictures, each with their corresponding, scientifically accurate counts). After a few iterations of comparing pictures and choosing "more" or "less", the app converges on its best guess on what the rough count is for the farmer's trap.

At first, I used a simple min/max algorithm for this, but that was when the dataset of picture-counts was only about 30 pictures. We now have datasets of thousands of pictures and I needed to tweak my algorithm to compensate for that. At first glance, what seemed to make the most sense was to split the data up into "buckets" of picture/counts within certain ranges, for example:

{
   "1000-2499": {
      "pictureUrl": "count",
      "pictureUrl": "count",
      ...
   },
   "2500-4999": {
      "pictureUrl": "count",
      "pictureUrl": "count",
      ...
   },
   "5000-7499":{
      "pictureUrl": "count",
      "pictureUrl": "count",
      ...
   },
   ...
}

Using this, I want to be able to grab a picture from a bucket, narrow the estimated count down to being within a specific bucket/range, and then converging on a decently accurate estimate (without having the user iterate through the more/less process too many times). However I am having a hard time developing an algorithm for this. I may be overthinking this but any help or insight would help greatly. The goal isn't exactly that it will produce a good estimate (that is actually what we are testing by doing this method), but that the algorithm itself converges in the best way possible, with the assumption that the estimation will be accurate.

Edit- If I find a viable solution before anyone else posts one I will post it here for critiquing.

  • 1
    First thought: make the keys "up to" instead of a range: `{1500:..., 2500:..., 5000:...}` - this way, keep going until next is too much. 1500 will hold 0-1500, 2500 will hold 1501-2500, 5000 will hold 2501-5000, and do on. Say you show first picture from `5000`, user says: "too much" - you show first picture from `2500`, user says: "too little" - you show middle picture of `2500`, etc. Something like a binary search... – iAmOren Jul 29 '20 at 00:45
  • @iAmOren Ah yes that would make more sense. And I was thinking of some kind of tree search. So what you are saying is kind of like a bisection starting at the high or low end of a bucket and based on the user input, skip to the middle of the bucket and continue? – ʻUlu Maika Jul 29 '20 at 00:53
  • Yes! Perhaps even show 6(?) images from evenly spaced counts to help zoom in? Perhaps have user chose more than one to help zoom in even more? – iAmOren Jul 29 '20 at 00:59
  • Totally different avenue: weight the flies, divide by average weight = (approximate) count! But that will kill your app... :) – iAmOren Jul 29 '20 at 01:01
  • @iAmOren haha yeah the purpose of this method is so that the farmers dont have to weigh and do the math of finding the count from that. Once we implement and test it and see the margin of error we will know how effective this method really is. The only downside to showing multiple pictures at once (if that's what you were suggesting) is that this app will be run on a phone and anything more than showing one picture makes it very hard to see the pictures themselves, and making it so they scroll through them is kind of a UI/UX nightmare haha – ʻUlu Maika Jul 29 '20 at 01:05
  • Good luck! I'd like to see the working app! I'm in Israel - are you in Hawaii/Hawaiian? – iAmOren Jul 29 '20 at 01:52
  • @iAmOren Yeah I am native Hawaiian and live on the Big Island of Hawaii. Lots of coffee farms out here, so thats why the USDA hired me to make this. Thank you for the support! – ʻUlu Maika Jul 29 '20 at 01:55
  • Glad to help!!! – iAmOren Jul 29 '20 at 02:22
  • Why buckets, can't you use bisection? E.g. first show the average, then a picture between the average and best (i.e. worst, depending on the choice), like that the interval is divided by 2 with every choice (O(logn)) and you should have a good guess pretty fast. One drawback is that one wrong choice will lead to just showing pictures outside the farmer's interval. – maraca Jul 29 '20 at 08:29
  • I would suggest the method that @maraca stated. But with a twist. Show the midpoint, and then narrow it down to 60% of the pictures. Still `O(log(n))` but if they got it wrong then they're not stuck in the wrong bucket. – btilly Jul 29 '20 at 15:59

0 Answers0