Randomly assign objects to `K` clusters, according to a Dirichlet Multinomial Distribution

Asked Apr 15 '20 at 11:40

Active Apr 15 '20 at 13:08

Viewed 26 times

I'm trying to clusterize short documents like, e.g., the following

sentences<-c("The color blue neutralizes orange yellow reflections.", 
             "Zod stabbed me with blue Kryptonite.", 
             "Because blue is your favourite colour.",
             "Red is wrong, blue is right.",
             "You and I are going to yellowstone.",
             "Van Gogh looked for some yellow at sunset.",
             "You ruined my beautiful green dress.",
             "You do not agree.",
             "There's nothing wrong with green.")

In the initialization step of my code, I should randomly assign the documents to K clusters, according to a Dirichlet Multinomial Distribution.

How could I perform this task?

Edit Thanks to @ags29's comment, I found in Sampling from Dirichlet-Multinomial

D=9  # number of documents in the corpus; I have 9 sentences in my example
k=2 # number of clusters (e.g. 2)
alpha=runif(D) # value of alpha, here chosen at random
p=rgamma(D,alpha) # pre-simulation of the Dirichlet
x=rmultinom(1,k,p)

What do you think?

edited Apr 15 '20 at 13:08

asked Apr 15 '20 at 11:40

Mark

1,577
16
43

1

https://stats.stackexchange.com/questions/145530/sampling-from-dirichlet-multinomial – ags29 Apr 15 '20 at 12:41
@ags29 thank you! I edited my post – Mark Apr 15 '20 at 13:08

Randomly assign objects to `K` clusters, according to a Dirichlet Multinomial Distribution

0 Answers0