0

I wish to separate a column of numeric data in R into 3 individual columns based on an additional variable in the data frame.

Here is some code to produce an example of the kind of data I am working with.

bus_reg_no <- c('G01', 'G01', 'G01', 'G02', 'G02', 'G02', 'G03', 'G03' , 'G03')
entity_name <- c('A','A','A','B','B','B','C','C','C')
TimePeriod <- c('2020','2019','2018','2020','2019','2018','2020','2019','2018')
market_test <- c(NA, 0.9330331232, 0.9969181046, 0.9429920482, 0.9689617356, 0.9764825438, NA, 0.2302289569, 0.7762837775)

dat <- cbind(bus_reg_no, entity_name, TimePeriod, market_test)

What I want to do with this data is convert the market_test variable into 3 columns for each year 2018, 2019, 2020. I then want a value of market_test for each year to be displayed once in every row which corresponds to entity_name. So in my desired output, each row will have one value of entity name and the corresponding market_test values for 2018, 2019 and 2020 in separate columns. Therefore, my final dataset will also have a shorter length than the original one.

I have tried using the dcast function. It is working on this smaller sample data set but when I apply it to my full data set it is constantly returning the warning 'Aggregate function missing, defaulting to 'length'' and is showing an output of 0's and 1's not the actual values of the market test in the columns for the 3 years.

I have also tried the pivot_wider function:

pivot_wider(names_from = c(TimePeriod, entity_name), values_from = market_test)

but I am getting 'Error in rval[, idvar] : subscript out of bounds' with any combination.

I have looked at other related posts but none have helped me sorting out any errors. Any help is much appreciated. Thanks

MrFlick
  • 195,160
  • 17
  • 277
  • 295
daisy
  • 61
  • 1
  • 6
  • Is your data really in a character array or is it in a data.frame? Should it be `dat <- data.frame(bus_reg_no, entity_name, TimePeriod, market_test)`? I just want to be clear what the input is. – MrFlick Dec 05 '22 at 15:37
  • Isnt this returning what you want? `pivot_wider(dat %>% data.frame(),names_from = "TimePeriod", values_from = "market_test")` – nhaus Dec 05 '22 at 15:38
  • Can you also please show exactly what the desired output for this test input is so we can verify possible solutions. – MrFlick Dec 05 '22 at 15:39

0 Answers0