4

I want to generate a large data frame (100000 rows and 3 columns) that has three columns (A, B and C).

This data frame satisfies the two conditions:

  1. in each row, A+B+C=1;
  2. all the A have a triangular distribution (min=0.2,mod=0.3,max=0.4), all the B have a triangular distribution (min=0.3,mod=0.4,max=0.5) and all the C have a triangular distribution (min=0.1,mod=0.3,max=0.5).

I could not figure out how to generate this kind of data set.

Many thanks for your suggestions in advance.

Songchao

Emmanuel-Lin
  • 1,848
  • 1
  • 16
  • 31

2 Answers2

1
N <- 100000

library(triangle)

A <- rtriangle(N, 0.2, 0.4, 0.3)
B <- rtriangle(N, 0.3, 0.5, 0.4)
C <- 1 - A - B

d = data.frame(A, B, C)
summary(d)

modified later

nr <- 100000

u1 <- runif(nr)
u2 <- runif(nr)
u3 <- (2 - u1 - u2) / 2

U <- cbind(u1, u2, u3)

# shuffle, because I am not sure about the tails of u3
for (i in (1:nrow(U))) {
  U[i, ] <- U[i, sample(1:3)]
}

t1 <- qtriangle(U[, 1], 0.2, 0.4, 0.3)
t2 <- qtriangle(U[, 2], 0.3, 0.5, 0.4)
t3 <- qtriangle(U[, 3], 0.1, 0.5, 0.3)

d <- cbind(t1, t2, t3)
summary(d)
cor(d)
r.user.05apr
  • 5,356
  • 3
  • 22
  • 39
0

I'm not really sure if this works, as I'm not certain if the transformation kills the distribution, but try this:

install.packages("triangle") #if not already present
library(triangle)

a <- rtriangle(10, a = .2, b = .4)
b <- rtriangle(10, a = .3, b = .5)
c <- rtriangle(10, a = .1, b = .5)

m <- cbind(a, b, c)
test <- sweep(m, 1, rowSums(m), FUN = "/") #divide all rows by their rowSums

> test
              a         b         c
 [1,] 0.3237202 0.4034106 0.2728692
 [2,] 0.2419613 0.3821729 0.3758658
 [3,] 0.2476927 0.3721925 0.3801149
 [4,] 0.2983462 0.4254064 0.2762474
 [5,] 0.3427140 0.4830743 0.1742117
 [6,] 0.2456610 0.3306648 0.4236742
 [7,] 0.3189454 0.4148087 0.2662459
 [8,] 0.3400111 0.3770924 0.2828965
 [9,] 0.3142197 0.3807667 0.3050136
[10,] 0.3221066 0.4222530 0.2556405
> rowSums(test)
 [1] 1 1 1 1 1 1 1 1 1 1
LAP
  • 6,605
  • 2
  • 15
  • 28
  • Thanks for your suggestion, I tried it before (also defined mod in the function rtriangle()) but this kind of transformation changed the distribution. – Songchao Chen Dec 11 '17 at 10:27
  • 1
    That is unfortunate. It may be fruitful to at least outline the attempts you've already made. This way, answers to your question are more likely to contain new approaches. – LAP Dec 11 '17 at 11:42
  • Yes, you are right. I should add my previous attempts. Anyway, many thanks for all of your suggestions. – Songchao Chen Dec 11 '17 at 12:42