-1

I have computed a vector of the frequency of different events, represented as fractions and sorted in descending order. I need to interface to a tool that requires positive integer percentages that must sum up to exactly 100. I would like to generate the percentages in a fashion that best represents the input distribution. That is, I would like relationship (ratios) among the percentages to best match the one in the input fractions, despite any non-linearities resulting in cutting a long tail.

I have a function that generates these percentages, but I don't think it is optimal or elegant. In particular, I would like to do more of the work in numeric space before resorting to "stupid integer tricks".

Here is an example frequency vector:

fractionals <- 1 / (2 ^ c(2, 5:6, 8, rep(9,358)))

And here is my function:

# Convert vector of fractions to integer percents summing to 100
percentize <- function(fractionals) {
  # fractionals is sorted descending and adds up to 1
  # drop elements that wouldn't round up to 1% vs. running total
  pctOfCum <- fractionals / cumsum(fractionals)
  fractionals <- fractionals[pctOfCum > 0.005]

  # calculate initial percentages
  percentages <- round((fractionals / sum(fractionals)) * 100)

  # if sum of percentages exceeds 100, remove proportionally
  i <- 1
  while (sum(percentages) > 100) {
    excess <- sum(percentages) - 100
    if (i > length(percentages)) {
      i <- 1
    }
    partialExcess <- max(1, round((excess * percentages[i]) / 100))
    percentages[i] <- percentages[i] - min(partialExcess,
                                           percentages[i] - 1)
    i <- i + 1
  }

  # if sum of percentages shorts 100, add proportionally
  i <- 1
  while (sum(percentages) < 100) {
    shortage <- 100 - sum(percentages)
    if (i > length(percentages)) {
      i <- 1
    }
    partialShortage <- max(1, round((shortage * percentages[i]) / 100))
    percentages[i] <- percentages[i] + partialShortage
    i <- i + 1
  }

  return(percentages)
}

Any ideas?

verbamour
  • 945
  • 9
  • 16
  • 4
    This is probably more appropriate for [Code Review](http://codereview.stackexchange.com/). – Thomas Jun 26 '14 at 20:02

1 Answers1

0

How about this? It rescales the variables so that it should add to 100, but if due to rounding it comes to 99 it adds 1 to the largest frequency.

fractionals <- 1 / (2 ^ c(2, 5:6, 8, rep(9,358)))
pctOfCum <- fractionals / cumsum(fractionals)
fractionals <- fractionals[pctOfCum > 0.005]

bunnies <- as.integer(fractionals / sum(fractionals) * 100) + 1
    bunnies[bunnies > 1] <- round(bunnies[bunnies > 1] * (100 -  
    sum(bunnies[bunnies == 1])) / sum(bunnies[bunnies > 1]))
if((sum(bunnies) < 100) == TRUE) bunnies[1] <- bunnies[1] + 1

> bunnies
[1] 45  6  3  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  
1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
kng229
  • 473
  • 5
  • 13