-1

How can I generate a matrix of 1 and 0 in R, for 1000 items (rows), where each item can only be 1 for a single trait out of 6 possibilities (columns) traits A, B, C, D, E and F e.g.

item A  B   C   D   E   F
1    1  0   0   0   0   0
2    0  1   0   0   0   0
3    1  0   0   0   0   0
4    0  0   0   0   1   0
5    0  0   0   0   1   0
6    0  0   1   0   0   0
7    0  0   0   1   0   0
8    0  1   0   0   0   0
9    1  0   0   0   0   0
10   0  0   0   0   1   0

So that when plotting these 6 traits (on the x axis A=0, B=0.2, C=0.4, D=0.6, E=0.8, F=1), their density probability follows a beta (3,7) distribution?

My objective is to generate a set of similar matrices, each representing different beta distributions e.g.(7,3),(2,8),(8,2), (3,3), so that they may jointly cover a wide range of them, including if possible a bimodal distribution other than (0.5, 0.5).

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Sergio Henriques
  • 135
  • 1
  • 1
  • 6
  • `sample(c(rep(0, 5), 1), 6)` – bouncyball Jul 26 '18 at 13:35
  • How can I edit this to insure that when generating 1000 items, the traits follow a beta (8,2) distribution? – Sergio Henriques Jul 26 '18 at 13:53
  • 1
    The beta is a continuous distribution but you seem to be generating a discrete distribution. How exactly are you translating between the two? What value exactly should follow a beta distribution? – MrFlick Jul 26 '18 at 14:18
  • I should have said broadly beta distributed. Following this online calculator: https://keisan.casio.com/exec/system/1180573226 for Beta (7, 3): A is (0,0), B (0.2,0.01032192), C (0.4, 0.37158912), D (0.6, 1.88116992), E(0.8,2.64241152), F(0,0); for Beta (1,1): A is (0,1), B (0.2,1), C (0.4, 1), D (0.6, 1), E(0.8,1), F(0,1); for Beta (0.5,0.5): A is (0,100), B (0.2,0.795774716), C (0.4, 0.649747334), D (0.6, 0.649747334), E(0.8,0.7957747155), F(0,100); for Beta (3, 3): A is (0,0), B (0.2,0.768), C (0.4, 1.728), D (0.6, 1.728), E(0.8,0.768), F(0,1); – Sergio Henriques Jul 26 '18 at 14:50

1 Answers1

0

Please see the results of simulation. I used sample function with probability distribution set through prob argument. As for B(0.5, 0.5) you could tweak x vector at 0 and 1 neighborhoods to exclude infinities:

set.seed(123)

x <- c(0.0, 0.2, 0.4, 0.6, 0.8, 1)
# for beta w/7 & 3 shapes
y <- dbeta(x, 7, 3)

# sample with probabilities y
samp <- data.frame(id = sample(1:6, 1000, y, replace = TRUE))

# prepare a diagonal matrix
m <- data.frame(diag(6), id = 1:6)

# merge to meet the condition only one '1' in each row
u <- merge(samp, m)

# remove id and adding letter names
u <- u[, -1]
names(u) <- LETTERS[1:6]

# validation 
# the result by simulation
colSums(u) / 1000
# A     B     C     D     E     F 
# 0.000 0.001 0.070 0.385 0.544 0.000 

# normalized beta distribution by built-in function
print(setNames(dbeta(x, 7, 3) / sum(dbeta(x, 7, 3)), LETTERS[1:6]), digits = 1)
# A     B     C     D     E     F 
# 0.000 0.002 0.076 0.383 0.539 0.000 
Artem
  • 3,304
  • 3
  • 18
  • 41