0

I have a big table in which I have calculated the number of counts by subcategory countsperc (subcategory names not shown) for every category (id), then the total of observations per category (id) in column sumofcounts, and the proportion of subcategory to the total (counsperc/sumofcounts) in apppropor (approx. proportions), that needs to be approximate (3 decimal).
The problem is, the sum of approximate proportions (old_sum) for categories (id) has to be 1.000 instead of 0.999, etc.
So, I would like to ask for a method to add or subtract 0.001, on any sub-item of column apppropor in order to get 1.000 always as the sum. For example, in row1 the number could be 0.334 instead of 0.333
EDIT: The goal of the task is not to produce solely a exact sum of 1, which has no utility, but to produce an input to other program, which will consider the column apppropor as is (requiring it will sum 1.000 per id, see error message below).

text1<-"
id    countsperc sumofcounts   apppropor     
item1          1           3       0.333     
item1          1           3       0.333     
item1          1           3       0.333     
item2          1         121       0.008     
item2        119         121       0.983     
item2          1         121       0.008     
item3          1          44       0.023    
item3          1          44       0.023     
item3         41          44       0.932     
item3          1          44       0.023     
item4          1          29       0.034     
item4          3          29       0.103      
item4          1          29       0.034   
item4         24          29       0.828"
table1<-read.table(text=text1,header=T)
library(data.table)
sums<-as.data.frame(setDT(table1)[, sum(`apppropor`), by = .(id)][,.(id, old_sum = V1)])
table1<-merge(table1,sums)
table1

chromEvol Version: 2.0. Last updated December 2013

The count probabilities for taxa Ad_mic not sum to 1.0 chromEvol: errorMsg.cpp:41: static void errorMsg::reportError(const string&, int): Assertion `0' failed. Aborted (core dumped)

Ferroao
  • 3,042
  • 28
  • 53

2 Answers2

0

If you need sum_of_prop to be identically equal to 1 in every row, you're calculating it the wrong way. You don't add 0.333 + 0.333 + 0.333 and then force that sum to be 1. You add (1/3) + (1/3) + (1/3) and then the sum actually is 1.

Assuming that no other column can change, try calculating sum_of_prop like this:

n <- length(table1$id)
new_sum_of_prop <- rep(0, n)
for (i in 1:n) {
  tempitem <- table1$id[i]
  tempsum <- sum(table1$countsperc[(table1$id == tempitem)])
  new_sum_of_prop[i] <- table1$sumofcounts[i] / tempsum
}

table2 <- as.data.frame(cbind(table1, new_sum_of_prop))
table2
      id countsperc sumofcounts apppropor sum_of_prop new_sum_of_prop
1  item1          1           3     0.333       0.999               1
2  item1          1           3     0.333       0.999               1
3  item1          1           3     0.333       0.999               1
4  item2          1         121     0.008       0.999               1
5  item2        119         121     0.983       0.999               1
6  item2          1         121     0.008       0.999               1
7  item3          1          44     0.023       1.001               1
8  item3          1          44     0.023       1.001               1
9  item3         41          44     0.932       1.001               1
10 item3          1          44     0.023       1.001               1
11 item4          1          29     0.034       0.999               1
12 item4          3          29     0.103       0.999               1
13 item4          1          29     0.034       0.999               1
14 item4         24          29     0.828       0.999               1

I understand that this isn't exactly what you asked for, but in the long run, your results are always healthier if you don't cut mathematical corners along the way.

mmyoung77
  • 1,343
  • 3
  • 14
  • 22
0

I found a way.

table1$dif<-1-table1$old_sum
table1<-table1[order(table1$id),]
len<-rle(as.vector(table1$id))[[1]]
table1$apppropor[cumsum(len)]<-table1$apppropor[cumsum(len)]+table1$dif[cumsum(len)]
#verify
library(data.table)
sums<-as.data.frame(setDT(table1)[, sum(`apppropor`), by = .(id)][,.(id, new_sum = V1)])
table1<-merge(table1,sums)
table1
Ferroao
  • 3,042
  • 28
  • 53