I am looking to run a multinomial logit model on discrete choice survey data. Prior to doing this I want to have data in wide format in my csv doc in excel and convert to long format in R using the following code;
dummychoicedataset2 <- read.csv("data55.csv", header = TRUE)
dcedata3 <- mlogit.data(dummychoicedataset2, varying = 5:15, choice = "Item", shape = "long")
head(dcedata3)
Error in reshapeLong(data, idvar = idvar, timevar = timevar, varying = varying, :
'varying' arguments must be the same length
When I run varying = 13:15 (i.e. Source.Abalone to Source.Sea Cucumber) no error message comes up i.e. when 3 out of 4 columns of each variable are selected. This suggests the issue lies when all 4 columns of a variable are selected but I cannot work out why as the columns each have the exact same amount of data in them.
Below is what the data frame looks like/how it is formatted;
> head(dummychoicedataset2)
Item Price Country Source d.FishMaw d.Abalone d.Grouper d.SeaCucumber
1 SeaCucumber 700 Japan Farmed 0 0 0 1
2 FishMaw 300 Australia Farmed 1 0 0 0
3 Grouper 1200 Japan Wild 0 0 1 0
4 Abalone 1500 SouthAfrica Farmed 0 1 0 0
5 Abalone 1500 SouthAfrica Wild 0 1 0 0
6 Grouper 1500 Mexico Wild 0 0 1 0
Price.FishMaw Price.Abalone Price.Grouper Price.SeaCucumber Country.FishMaw
1 300 1200 1500 300 SouthAfrica
2 300 700 1200 1500 Australia
3 700 1500 1200 300 Mexico
4 1200 1500 700 300 Japan
5 700 1500 300 1200 Australia
6 1200 700 1500 300 SouthAfrica
Country.Abalone Country.Grouper Country.SeaCucumber Source.FishMaw Source.Abalone
1 Australia Mexico Japan Farmed Wild
2 Mexico Japan SouthAfrica Farmed Wild
3 Australia Japan SouthAfrica Wild Farmed
4 SouthAfrica Mexico Australia Wild Farmed
5 SouthAfrica Japan Mexico Farmed Wild
6 Japan Mexico Australia Farmed Farmed
Source.Grouper Source.SeaCucumber
1 Wild Farmed
2 Farmed Wild
3 Wild Farmed
4 Wild Farmed
5 Wild Farmed
6 Wild Wild
Any ideas for where the problem could lie? Is it likely to be a formatting mistake/ a data entry error or is there a more fundamental thing that is wrong with this?
Please find the dput below of the data
> dput(dummychoicedataset2[1:10, ])
structure(list(Item = c("SeaCucumber", "FishMaw", "Grouper",
"Abalone", "Abalone", "Grouper", "FishMaw", "SeaCucumber", "FishMaw",
"Abalone"), Price = c(700L, 300L, 1200L, 1500L, 1500L, 1500L,
300L, 700L, 300L, 1500L), Country = c("Japan", "Australia", "Japan",
"SouthAfrica", "SouthAfrica", "Mexico", "Mexico", "Australia",
"Japan", "SouthAfrica"), Source = c("Farmed", "Farmed", "Wild",
"Farmed", "Wild", "Wild", "Farmed", "Farmed", "Farmed", "Wild"
), d.FishMaw = c(0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L), d.Abalone = c(0L,
0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L), d.Grouper = c(0L, 0L, 1L,
0L, 0L, 1L, 0L, 0L, 0L, 0L), d.SeaCucumber = c(1L, 0L, 0L, 0L,
0L, 0L, 0L, 1L, 0L, 0L), Price.FishMaw = c(300L, 300L, 700L,
1200L, 700L, 1200L, 300L, 1500L, 300L, 1200L), Price.Abalone = c(1200L,
700L, 1500L, 1500L, 1500L, 700L, 1200L, 300L, 700L, 1500L), Price.Grouper = c(1500L,
1200L, 1200L, 700L, 300L, 1500L, 1500L, 1200L, 1200L, 700L),
Price.SeaCucumber = c(300L, 1500L, 300L, 300L, 1200L, 300L,
700L, 700L, 1500L, 300L), Country.FishMaw = c("SouthAfrica",
"Australia", "Mexico", "Japan", "Australia", "SouthAfrica",
"Mexico", "SouthAfrica", "Japan", "Australia"), Country.Abalone = c("Australia",
"Mexico", "Australia", "SouthAfrica", "SouthAfrica", "Japan",
"Australia", "Mexico", "Mexico", "SouthAfrica"), Country.Grouper = c("Mexico",
"Japan", "Japan", "Mexico", "Japan", "Mexico", "Japan", "Japan",
"SouthAfrica", "Japan"), Country.SeaCucumber = c("Japan",
"SouthAfrica", "SouthAfrica", "Australia", "Mexico", "Australia",
"SouthAfrica", "Australia", "Australia", "Mexico"), Source.FishMaw = c("Farmed",
"Farmed", "Wild", "Wild", "Farmed", "Farmed", "Farmed", "Farmed",
"Farmed", "Wild"), Source.Abalone = c("Wild", "Wild", "Farmed",
"Farmed", "Wild ", "Farmed", "Wild", "Wild", "Wild", "Wild "
), Source.Grouper = c("Wild", "Farmed", "Wild", "Wild", "Wild",
"Wild", "Farmed", "Wild", "Farmed", "Farmed"), Source.SeaCucumber = c("Farmed",
"Wild", "Farmed", "Farmed", "Farmed", "Wild", "Wild", "Farmed",
"Farmed", "Farmed")), row.names = c(NA, 10L), class = "data.frame")