0

I have the following dataset:

adv_id<-c("abc","abc","abc","pqr","pqr","pqr","xyz","xyz","xyz")
seg_id<-c("1","2","3","4","5","6","7","8","9")
value<-c(120,450,108,1004,567,768,111,222,3334)
data<-data.frame(adv_id,seg_id,value)

I want to calculate parent total percent on adv_id level. i.e. like when we use the excel pivot. So my output should look like

adv_id    seg_id    value    percentage
  abc       1         120       17.7%
  abc       2         450       66.37%
  abc       3         108       15.93%
  pqr       4         1004      42.92%
  pqr       5          567      24.24%
  pqr       6          768      32.83%
  xyz       7          111      3.03%
  xyz       8          222      6.05%
  xyz       9         3334      90.92%

I used the following code

totaluser=0
percent=c()

for (i in 2:nrow(data))
{
   if data$adv_id[i]==data$adv_id[i-1]
  {
  totaluser = totaluser+rawdata$value[i]
  percentage[i-1] = (data$value[i-1]/totaluser)*100
  }
  else {totaluser=0}
 }

and using sqldf package

percentage<-sqldf("select adv_id,seg_id,value,sum(value)/sum(sum(value)) over(partition by adv_id) as 'percentage' from data").

But both times i cannot get the desired output. Can we use the summarise function in dplyr package somehow?

Jaap
  • 81,064
  • 34
  • 182
  • 193
Bitanshu Das
  • 627
  • 2
  • 8
  • 21

0 Answers0