1

In my program to simulate many-particle evolution, I have a map that takes a key value pop (the population size) and returns a slice containing the sites that have this population: myMap[pop][]int. These slices are generically quite large.

At each evolution step I choose a random population size RandomPop. I would then like to randomly choose a site that has a population of at least RandomPop. The sitechosen is used to update my population structures and I utilize a second map to efficiently update myMap keys. My current (slow) implementation looks like

func Evolve( ..., myMap map[int][]int ,...){

    RandomPop = rand.Intn(rangeofpopulation)+1

    for i:=RandPop,; i<rangeofpopulation;i++{
        preallocatedslice=append(preallocatedslice,myMap[i]...)
    }

    randomindex:= rand.Intn(len(preallocatedslice))
    sitechosen= preallocatedslice[randomindex]

    UpdateFunction(site)

    //reset preallocated slice 
    preallocatedslice=preallocatedslice[0:0]

}

This code (obviously) hits a huge bottle-neck when copying values from the map to preallocatedslice, with runtime.memmove eating 87% of my CPU usage. I'm wondering if there is an O(1) way to randomly choose an entry contained in the union of slices indicated by myMap with key values between 0 and RandomPop ? I am open to packages that allow you to manipulate custom hashtables if anyone is aware of them. Suggestions don't need to be safe for concurrency

Other things tried: I previously had my maps record all sites with values of at least pop but that took up >10GB of memory and was stupid. I tried stashing pointers to the relevant slices to make a look-up slice, but go forbids this. I could sum up the lengths of each slice and generate a random number based on this and then iterate through the slices in myMap by length, but this is going to be much slower than just keeping an updated cdf of my population and doing a binary search on it. The binary search is fast, but updating the cdf, even if done manually, is O(n). I was really hoping to abuse hashtables to speed up random selection and update if possible

A vague thought I have is concocting some sort of nested structure of maps pointing to their contents and also to the map with a key one less than theirs or something.

kapaw
  • 265
  • 1
  • 2
  • 11
  • 1
    why dont you keep your sites ordered by population so you can randomly choose one in O(log n)? – juvian Oct 02 '18 at 15:18
  • Can you elaborate on how you're imagining this? – kapaw Oct 02 '18 at 15:36
  • Well its unclear what you do in UpdateFunction, but you need a structure that supports both update and query by ith item in O(log n). You can then search the index of the first site with population at least x and then randomly pick one between that and n. A structure that probably supports this is [order static tree](https://en.wikipedia.org/wiki/Order_statistic_tree). – juvian Oct 02 '18 at 15:42
  • Oh yes, so I think I wanted to try something like this but I found the bottle neck here to be rearrangement of my sorted slice. When I update, the population at the chosen site and perhaps near-by sites changes. In order to update the sorted slice, I need to move an element to a new location. The solutions to this in go are copying parts of the slice onto itself (slow, will reproduce memmove bottleneck) or figuring out a nice swapping algorithm (I haven't given much thought to this, maybe I'll give it a shot, but my worry is that it will quickly accumulate processes/complexity) – kapaw Oct 02 '18 at 15:53
  • Instead of posting a new question after being told that your original question was too vague, please **edit** the initial question here https://stackoverflow.com/q/52599876/1230836 – Elias Van Ootegem Oct 02 '18 at 16:06
  • This has nothing to do with my previous question. I changed my method and rewrote my code. Thanks – kapaw Oct 02 '18 at 16:06
  • It does have something to do with this one, too: https://stackoverflow.com/q/52442838/1230836 Also either answering your own question, or accepting an answer to mark it as "done" is recommended – Elias Van Ootegem Oct 02 '18 at 16:08
  • This is a specific question about accessing the elements of a map efficiently? I haven't found a great algorithm for my first post yet. I now have specific questions about utilizing built in hash tables, that if I can work out I'll push my project to git and answer my own question.. – kapaw Oct 02 '18 at 16:10
  • Also, sorry didn't see your other response until just now. – kapaw Oct 02 '18 at 16:18
  • @kapaw I dont understand what are your slices, but updating population number on order statistic tree is O(log n) – juvian Oct 02 '18 at 16:34
  • Could you add UpdateFunction code ? – juvian Oct 03 '18 at 14:15

1 Answers1

0

I was looking at your code and I have a question. Why do you have to copy values from the map to the slice? I mean, I think that I am following the logic behind... but I wonder if there is a way to skip that step.

So we have:

func Evolve( ..., myMap map[int][]int ,...){

    RandomPop = rand.Intn(rangeofpopulation)+1

    for i:=RandPop,; i<rangeofpopulation;i++{
        // slice of preselected `sites`. one of this will be 'siteChosen'
        // we expect to have `n sites` on `preAllocatedSlice`
        // where `n` is the amount of iterations, 
        // ie; n = rangeofpopulation - RandPop
        preallocatedslice=append(preallocatedslice,myMap[i]...) 
    }

    // Once we have a list of sites, we select `one`
    // under a normal distribution every site ha a chance of 1/n to be selected.
    randomindex:= rand.Intn(len(preallocatedslice))
    sitechosen= preallocatedslice[randomindex]

    UpdateFunction(site)
    ...

}

But what if we change that to:

func Evolve( ..., myMap map[int][]int ,...){

    if len(myMap) == 0 {
        // Nothing to do, print a log! 
        return
    }

    // This variable will hold our site chosen!
    var siteChosen []int

    // Our random population size is a value from 1 to rangeOfPopulation 
    randPopSize := rand.Intn(rangeOfPopulation) + 1

    for i := randPopSize; i < rangeOfPopulation; i++ {
        // We are going to pretend that the current candidate is the siteChosen 
        siteChosen = myMap[i]

        // Now, instead of copying `myMap[i]` to preAllocatedSlice
        // We will test if the current candidate is actually the 'siteChosen` here:

        // We know that the chances for an specific site to be the chosen is 1/n,
        // where n = rangeOfPopulation - randPopSize
        n := float64(rangeOfPopulation - randPopSize)
        // we roll the dice...
        isTheChosenOne := rand.Float64() > 1/n

        if isTheChosenOne {
            // If the candidate is the Chosen site, 
            // then we don't need to iterate over all the other elements.
            break
        }

    }

    // here we know that `siteChosen` is a.- a selected candidate, or 
    // b.- the last element assigned in the loop 
    // (in the case that `isTheChosenOne` was always false [which is a probable scenario])
    UpdateFunction(siteChosen)
    ...
}

Also if you want to can calculate n, or 1/n outside the loop. So the idea is testing inside the loop if the candidate is the siteChosen, and avoid copying the candidates to this preselection pool.

mayo
  • 3,845
  • 1
  • 32
  • 42
  • Some comments. Correct me if I'm wrong, but I don't believe this properly assigns weights to the slices indicated by `myMap[i]`. After determining `randPopSize`, the chance of a slice with population i being chosen should be ~ len(myMap[i])/(sum j= randPopSize to range len(myMap[j])) = (# of sites in slice)/(# of sites with population >= randPopSize). Here it seems like all slices are being weighted based solely on their population size? In an extreme case, say `myMap[1000]={site1}` and `myMap[999]`={sites 2 through 2002}, both would have a nearly identical chance of being chosen – kapaw Oct 03 '18 at 13:29
  • In the end, I think changing what you've written to properly weight the slices becomes equivalent to rejection sampling, which due the nature of my data averages something slightly less than O(n) speed before accounting for random number generation time (which unfortunately matters as I'd like this to run >10^8 steps) – kapaw Oct 03 '18 at 13:32
  • oh, you are completely right. My bad. Yes, the right chance to select an element can be fixed, I missed the part that you have more than one site per population value. And yes, this will be an O(n) algorithm :( but at least you can avoid the append operation :) which is making the previous code O(n^2). I will try to think about something else! If you find a solution pls post it, it would be interesting to see a good approach :) – mayo Oct 03 '18 at 17:01