0

I am looking to run a multinomial logit model on discrete choice survey data. Prior to doing this I want to have data in wide format in my csv doc in excel and convert to long format in R using the following code;

dummychoicedataset2 <- read.csv("data55.csv", header = TRUE)
dcedata3 <- mlogit.data(dummychoicedataset2, varying = 5:15, choice = "Item", shape = "long")
head(dcedata3)   
Error in reshapeLong(data, idvar = idvar, timevar = timevar, varying = varying,  : 
  'varying' arguments must be the same length

When I run varying = 13:15 (i.e. Source.Abalone to Source.Sea Cucumber) no error message comes up i.e. when 3 out of 4 columns of each variable are selected. This suggests the issue lies when all 4 columns of a variable are selected but I cannot work out why as the columns each have the exact same amount of data in them.

Below is what the data frame looks like/how it is formatted;

> head(dummychoicedataset2)
         Item Price     Country Source d.FishMaw d.Abalone d.Grouper d.SeaCucumber
1 SeaCucumber   700       Japan Farmed         0         0         0             1
2     FishMaw   300   Australia Farmed         1         0         0             0
3     Grouper  1200       Japan   Wild         0         0         1             0
4     Abalone  1500 SouthAfrica Farmed         0         1         0             0
5     Abalone  1500 SouthAfrica   Wild         0         1         0             0
6     Grouper  1500      Mexico   Wild         0         0         1             0
  Price.FishMaw Price.Abalone Price.Grouper Price.SeaCucumber Country.FishMaw
1           300          1200          1500               300     SouthAfrica
2           300           700          1200              1500       Australia
3           700          1500          1200               300          Mexico
4          1200          1500           700               300           Japan
5           700          1500           300              1200       Australia
6          1200           700          1500               300     SouthAfrica
  Country.Abalone Country.Grouper Country.SeaCucumber Source.FishMaw Source.Abalone
1       Australia          Mexico               Japan         Farmed           Wild
2          Mexico           Japan         SouthAfrica         Farmed           Wild
3       Australia           Japan         SouthAfrica           Wild         Farmed
4     SouthAfrica          Mexico           Australia           Wild         Farmed
5     SouthAfrica           Japan              Mexico         Farmed         Wild  
6           Japan          Mexico           Australia         Farmed         Farmed
  Source.Grouper Source.SeaCucumber
1           Wild             Farmed
2         Farmed               Wild
3           Wild             Farmed
4           Wild             Farmed
5           Wild             Farmed
6           Wild               Wild

Any ideas for where the problem could lie? Is it likely to be a formatting mistake/ a data entry error or is there a more fundamental thing that is wrong with this?

Please find the dput below of the data

> dput(dummychoicedataset2[1:10, ])
structure(list(Item = c("SeaCucumber", "FishMaw", "Grouper", 
"Abalone", "Abalone", "Grouper", "FishMaw", "SeaCucumber", "FishMaw", 
"Abalone"), Price = c(700L, 300L, 1200L, 1500L, 1500L, 1500L, 
300L, 700L, 300L, 1500L), Country = c("Japan", "Australia", "Japan", 
"SouthAfrica", "SouthAfrica", "Mexico", "Mexico", "Australia", 
"Japan", "SouthAfrica"), Source = c("Farmed", "Farmed", "Wild", 
"Farmed", "Wild", "Wild", "Farmed", "Farmed", "Farmed", "Wild"
), d.FishMaw = c(0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L), d.Abalone = c(0L, 
0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L), d.Grouper = c(0L, 0L, 1L, 
0L, 0L, 1L, 0L, 0L, 0L, 0L), d.SeaCucumber = c(1L, 0L, 0L, 0L, 
0L, 0L, 0L, 1L, 0L, 0L), Price.FishMaw = c(300L, 300L, 700L, 
1200L, 700L, 1200L, 300L, 1500L, 300L, 1200L), Price.Abalone = c(1200L, 
700L, 1500L, 1500L, 1500L, 700L, 1200L, 300L, 700L, 1500L), Price.Grouper = c(1500L, 
1200L, 1200L, 700L, 300L, 1500L, 1500L, 1200L, 1200L, 700L), 
    Price.SeaCucumber = c(300L, 1500L, 300L, 300L, 1200L, 300L, 
    700L, 700L, 1500L, 300L), Country.FishMaw = c("SouthAfrica", 
    "Australia", "Mexico", "Japan", "Australia", "SouthAfrica", 
    "Mexico", "SouthAfrica", "Japan", "Australia"), Country.Abalone = c("Australia", 
    "Mexico", "Australia", "SouthAfrica", "SouthAfrica", "Japan", 
    "Australia", "Mexico", "Mexico", "SouthAfrica"), Country.Grouper = c("Mexico", 
    "Japan", "Japan", "Mexico", "Japan", "Mexico", "Japan", "Japan", 
    "SouthAfrica", "Japan"), Country.SeaCucumber = c("Japan", 
    "SouthAfrica", "SouthAfrica", "Australia", "Mexico", "Australia", 
    "SouthAfrica", "Australia", "Australia", "Mexico"), Source.FishMaw = c("Farmed", 
    "Farmed", "Wild", "Wild", "Farmed", "Farmed", "Farmed", "Farmed", 
    "Farmed", "Wild"), Source.Abalone = c("Wild", "Wild", "Farmed", 
    "Farmed", "Wild  ", "Farmed", "Wild", "Wild", "Wild", "Wild  "
    ), Source.Grouper = c("Wild", "Farmed", "Wild", "Wild", "Wild", 
    "Wild", "Farmed", "Wild", "Farmed", "Farmed"), Source.SeaCucumber = c("Farmed", 
    "Wild", "Farmed", "Farmed", "Farmed", "Wild", "Wild", "Farmed", 
    "Farmed", "Farmed")), row.names = c(NA, 10L), class = "data.frame")
chris1
  • 11
  • 3
  • chris1, it's going to be easier for us to "play" with the data if it is in a more-easily consumed format. While simple frames can often be "scraped" in a sense, wrapped representations like this (and frames with embedded spaces, and frames with ambiguous data types) are much harder (and we are never certain we have everything perfectly right). An unambiguous and really-easy-to-use method is to post the output from `dput(x)`, where `x` is big enough to get the point across and be reproducible while small enough to not flood the screen with too much stuff. Can you please add that? – r2evans Apr 09 '21 at 14:14
  • 1
    dput added - is this what you meant? – chris1 Apr 09 '21 at 15:05
  • yes, that's much clearer (and easier), thank you. – r2evans Apr 09 '21 at 15:22

0 Answers0