Calculate percentages of range of values within multiple columns in multiple simulated dataframes

Question

I have five dataframes, each with 8 columns and 10000 rows. The data for each dataframe was drawn from random t-distributions with varying mean and sdspecifications. Each column corresponds to one of these specifications. Meanwhile, each case is one t-value drawn from the specific t-distribution. I did this by hand, without an automatic process. (Any suggestion on how to do this easier?)

For each column in a dataframe, I'd like to calculate percentages of observations that lie between specific ranges, >0,=<0.6; >0.6,=<0.7; >0.7, =<0.8 and so on until >1.4.

I tried the for loop but that is still difficult for me to understand how it works, so I failed at that too.

isim20$ival_05 <- cut(isim20[,1], c(0,0.6, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.3, 1.4, max(isim20[,1])))
isim20$ival_08 <- cut(isim20[,2], c(0,0.6, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.3, 1.4, max(isim20[,2])))
...

, where isim20is my dataframe ival_05 and ival_08are two variables (columns) for which I'd like to calculate the percentages given by the ranges in the cut command.

I got stuck at this point because I fail to understand how can I calculate the percentage of each value range for each column at once (to avoid doing this by hand). In addition to that, I have to repeat everything for all five data frames.

Thank you for all your suggestions!

in the meantime, i found this option ```prop.table(table(isim20$ival_05))``` which works fine, but i'll have to repeat it quite a bit. — Stanciu Adrian, Jul 15 '20 at 14:54

score 1 · Accepted Answer · answered Jul 15 '20 at 16:02

I believe this can help (Final Freq var saves the percentage you need). I created dummy data and no packages are needed:

#Dummy data
set.seed(123)
DF <- data.frame(v1=runif(10000,0,2),
                 v2=runif(10000,0,2),
                 v3=runif(10000,0,2),
                 v4=runif(10000,0,2),
                 v5=runif(10000,0,2),
                 v6=runif(10000,0,2),
                 v7=runif(10000,0,2),
                 v8=runif(10000,0,2))
#Create function
compute_breaks <- function(x)
{
  y <- cut(x, breaks=c(0,0.6, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.3, 1.4, max(x)),include.lowest = T,right = T,
           dig.lab = 10)
  z <- as.data.frame(prop.table(table(y)))
  return(z)
}
#Apply and obtain a list
List <- apply(DF,2,compute_breaks)
#Bind all
DF2 <- do.call(rbind,List)
#Format to obtain variable
DF2$var <- gsub("\\..*",'',rownames(DF2))
rownames(DF2) <- NULL

You will get something like this (I include only head() and tail()):

         y   Freq var
1   [0,0.6] 0.3012  v1
2 (0.6,0.7] 0.0485  v1
3 (0.7,0.8] 0.0477  v1
4 (0.8,0.9] 0.0567  v1
5   (0.9,1] 0.0516  v1
6   (1,1.1] 0.0481  v1
----------------------
                  y   Freq var
75           (0.9,1] 0.0476  v8
76           (1,1.1] 0.0549  v8
77         (1.1,1.2] 0.0480  v8
78         (1.2,1.3] 0.0476  v8
79         (1.3,1.4] 0.0478  v8
80 (1.4,1.999860199] 0.2999  v8

@StanciuAdrian Great ! If you feel comfortable with this answer just accept it :) — Duck, Jul 16 '20 at 11:16
I'm not sure I understand, what do you mean "just accept it" - is there an option, like a button to do so? Or do you mean that I should use the code as is? — Stanciu Adrian, Jul 16 '20 at 13:31
@StanciuAdrian Yeah in the left side of the answer there is a tick you should accept the answer by ticking or clicking that element so that it becomes green :) — Duck, Jul 16 '20 at 13:32

Calculate percentages of range of values within multiple columns in multiple simulated dataframes

1 Answers1