2

Consider the following data set:

df <- data.frame(id=1:10,
                 v1=c(2.35456185,1.44501001,2.98712312,0.12345123,0.96781234,
                      1.23934551,5.00212233,4.34120000,1.23443213,0.00112233),
           v2=c(0.22222222,0.00123456,2.19024869,0.00012000,0.00029848,
                0.12348888,0.46236577,0.85757000,0.05479729,0.00001202))

My intention is to round the values in v1 and v2 to the nearest one decimal place (10% of observation), two decimals (40% of observations), and three decimal places (50% of observations) randomly. I can use the round() function to round numbers to certain decimal places uniformly. In my case, however, it's not uniform. Thank you in advance!

Example of output needed (of course mine is not random):

id   v1    v2
 1   2.3   0.2
 2   1.45  0
 3   2.99  2.19
 4   0.12  0
 5   0.97  0
 6   1.239 0.123
 7   5.002 0.462
 8   4.341 0.858
 9   1.234 0.055
10   0.001 0
iGada
  • 599
  • 3
  • 9

2 Answers2

3

We may create a grouping with sample based on the probbablity, and then round the v1 column based on the value of the group

library(dplyr)
df %>%
  group_by(grp = sample(1:3, size = n(), replace = TRUE,
     prob = c(0.10, 0.4, 0.5))) %>% 
  mutate(v1 = round(v1, first(grp))) %>%
  ungroup %>% 
  select(-grp)

-output

# A tibble: 10 × 2
      id    v1
   <int> <dbl>
 1     1 2.36 
 2     2 1.44 
 3     3 2.99 
 4     4 0.123
 5     5 0.97 
 6     6 1.24 
 7     7 5.00 
 8     8 4.3  
 9     9 1.23 
10    10 0    

For multiple columns, use across to loop over

df %>%
   mutate(across(v1:v2, ~ round(.x, sample(1:3, size = n(),
    replace = TRUE, prob = c(0.10, 0.40, 0.50)))))

Or we pass the sampled output in digits argument of round directly

df$v1 <- with(df, round(v1, sample(1:3, size = nrow(df), 
    replace = TRUE, prob = c(0.10, 0.4, 0.5))))

Update

Just checking the rounded values

library(stringr)
df %>%
   mutate(across(v1:v2, ~ sample(1:3, size = n(),
    replace = TRUE, prob = c(0.10, 0.40, 0.50)), 
    .names = "{.col}_sample_ind"),
    across(v1:v2, ~  round(.x, digits = cur_data()[[str_c(cur_column(),
      "_sample_ind")]]), 
    .names = "{.col}_rounded")) %>%
   as_tibble

-output

  # A tibble: 10 × 7
      id      v1        v2 v1_sample_ind v2_sample_ind v1_rounded v2_rounded
   <int>   <dbl>     <dbl>         <int>         <int>      <dbl>      <dbl>
 1     1 2.35    0.222                 3             2      2.36       0.22 
 2     2 1.45    0.00123               3             3      1.44       0.001
 3     3 2.99    2.19                  1             2      3          2.19 
 4     4 0.123   0.00012               3             2      0.123      0    
 5     5 0.968   0.000298              3             1      0.968      0    
 6     6 1.24    0.123                 3             3      1.24       0.123
 7     7 5.00    0.462                 2             3      5          0.462
 8     8 4.34    0.858                 2             1      4.34       0.9  
 9     9 1.23    0.0548                2             2      1.23       0.05 
10    10 0.00112 0.0000120             2             3      0          0    
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you for the reply. One more question: Is that possible to do the same exercise for two columns (I will add `v2`)? – iGada Feb 04 '23 at 18:54
  • All the values of `v1` and `v2` are rounded to three decimal places when using the `across()` function. Could you check it once again? Tnx! – iGada Feb 06 '23 at 17:54
  • Hi @akrun! I got an error message which says `Error in select(., id) : unused argument (id)`. – iGada Feb 06 '23 at 20:55
  • I think your latest answer (the big table you get) is correct if the trailing zeros are dropped. I mean instead of `1.230` for the nearest two decimal places it's nice if we manage to produce `1.23` only. Txs! – iGada Feb 06 '23 at 21:06
  • @iGada just add `as_tibble` as in theupdate – akrun Feb 06 '23 at 21:08
  • The original problem comes back when you add `as_tibble`. Sorry for bothering you a lot. – iGada Feb 06 '23 at 21:10
  • @iGada please do know that `as_tibble` does some print formatting, but if you extract the column values, it will be your expected output. If your issue is just to show in printing, then it may need `format` to a character – akrun Feb 06 '23 at 21:12
  • Thank you so much. I will try removing the trailing zeros as printing numbers exactly as they are is very important in my case. – iGada Feb 06 '23 at 21:15
  • 1
    @iGada perhaps you know how R prints a numeric vector `c(0, 0.12, 5.863) #[1] 0.000 0.120 5.863`, which is basically a print format. It has nothing to do with the rounding issue you mentioned – akrun Feb 06 '23 at 21:16
3

Update: Addressing the probabilities:

library(dplyr)
    
df %>%
 rowwise() %>% 
 mutate(v2 = round(v1,sample(1:3, 1,  prob = c(0.1, 0.4, 0.5))))
      id      v1    v2
   <int>   <dbl> <dbl>
 1     1 2.35     2.35
 2     2 1.45     1.44
 3     3 2.99     2.99
 4     4 0.123    0.12
 5     5 0.968    1   
 6     6 1.24     1.24
 7     7 5.00     5.00
 8     8 4.34     4.34
 9     9 1.23     1.2 
10    10 0.00112  0   

Here we round row wise randomly between 1 and 3:

library(dplyr)

df %>% 
  rowwise() %>% 
  mutate(V1 = round(v2,sample(1:3, 1)))
      id      v1    V2
   <int>   <dbl> <dbl>
 1     1 2.35    2.36 
 2     2 1.45    1.44 
 3     3 2.99    2.99 
 4     4 0.123   0.123
 5     5 0.968   0.968
 6     6 1.24    1.24 
 7     7 5.00    5.00 
 8     8 4.34    4.34 
 9     9 1.23    1.23 
10    10 0.00112 0.001
TarJae
  • 72,363
  • 6
  • 19
  • 66