I am writing some code to generate balanced experimental designs for market research, specifically for use in conjoint analysis and maximum difference scaling.
The first step is to generate a Partially Balanced Incompleted Block (PBIB) design. This is straight-forward using the R package AlgDesign
.
For most types of research such a design would be sufficient. However, in market research one wants to control for order effects in each block. This is where I would appreciate some help.
Create test data
# The following code is not essential in understanding the problem,
# but I provide it in case you are curious about the origin of the data itself.
#library(AlgDesign)
#set.seed(12345)
#choices <- 4
#nAttributes <- 7
#blocksize <- 7
#bsize <- rep(choices, blocksize)
#PBIB <- optBlock(~., withinData=factor(1:nAttributes), blocksizes=bsize)
#df <- data.frame(t(array(PBIB$rows, dim=c(choices, blocksize))))
#colnames(df) <- paste("Item", 1:choices, sep="")
#rownames(df) <- paste("Set", 1:nAttributes, sep="")
df <- structure(list(
Item1 = c(1, 2, 1, 3, 1, 1, 2),
Item2 = c(4, 4, 2, 5, 3, 2, 3),
Item3 = c(5, 6, 5, 6, 4, 3, 4),
Item4 = c(7, 7, 6, 7, 6, 7, 5)),
.Names = c("Item1", "Item2", "Item3", "Item4"),
row.names = c("Set1", "Set2", "Set3", "Set4", "Set5", "Set6", "Set7"),
class = "data.frame")
** Define two helper functions
balanceMatrix
calculates the balance of the matrix:
balanceMatrix <- function(x){
t(sapply(unique(unlist(x)), function(i)colSums(x==i)))
}
balanceScore
calculates a metric of 'fit' - lower scores are better, with zero perfect:
balanceScore <- function(x){
sum((1-x)^2)
}
Define a function that resamples the rows at random
findBalance <- function(x, nrepeat=100){
df <- x
minw <- Inf
for (n in 1:nrepeat){
for (i in 1:nrow(x)){df[i,] <- sample(df[i, ])}
w <- balanceMatrix(df)
sumw <- balanceScore(w)
if(sumw < minw){
dfbest <- df
minw <- sumw
}
}
dfbest
}
Main code
The dataframe df
is a balanced design of 7 sets. Each set will display 4 items to the respondent. The numeric values in df
refers to 7 different attributes. For example, in Set1 respondent will be asked to choose his/her preferred option from attributes 1, 3, 4 and 7.
The ordering of items in each set is conceptually not important. Thus an ordering of (1,4,5,7) is identical to (7,5,4,1).
However, to get a fully balanced design, each attribute will appear an equal number of times in each column. This design is there imbalanced, since attribute 1 appears 4 times in column 1:
df
Item1 Item2 Item3 Item4
Set1 1 4 5 7
Set2 2 4 6 7
Set3 1 2 5 6
Set4 3 5 6 7
Set5 1 3 4 6
Set6 1 2 3 7
Set7 2 3 4 5
To try and find a more balanced design, I wrote the function findBalance
. This conducts a random search for better solutions, by randomly sampling across the rows of df
. With 100 repeats it finds the following best solution:
set.seed(12345)
dfbest <- findBalance(df, nrepeat=100)
dfbest
Item1 Item2 Item3 Item4
Set1 7 5 1 4
Set2 6 7 4 2
Set3 2 1 5 6
Set4 5 6 7 3
Set5 3 1 6 4
Set6 7 2 3 1
Set7 4 3 2 5
This appears more balanced, and the calculated balance matrix contains lots of ones. The balance matrix counts the number of times each attribute appears in each column. For example, the following table indicates (in the top left hand cell) that attribute 1 appears twice not at all in column 1, and twice in column 2:
balanceMatrix(dfbest)
Item1 Item2 Item3 Item4
[1,] 0 2 1 1
[2,] 1 1 1 1
[3,] 1 1 1 1
[4,] 1 0 1 2
[5,] 1 1 1 1
[6,] 1 1 1 1
[7,] 2 1 1 0
The balance score for this solution is 6, indicating there are at least six cells unequal to 1:
balanceScore(balanceMatrix(dfbest))
[1] 6
My question
Thank you for following this detailed example. My question is how can I rewrite this search function to be more systematic? I'd like to tell R to:
- Minimize
balanceScore(df)
- By changing row order of
df
- Subject to: already fully constrained