1

I have the following sample data:

df

val_str
fruit=apple,machine=crane
machine=crane
machine=roboter
fruit=apple
machine=roboter,food=samosa

df2

fruit   machine   food
apple   crane     NA
NA      crane     NA
NA      roboter   NA
apple   NA        NA
NA      roboter   samosa

How do I get from df to df2? Each unique value before the "=" should create a column and then the respective values belonging to this should be spread across the rows.

Code:

df <- data.frame(val_str = c("fruit=apple,machine=crane","machine=crane","machine=roboter", "fruit=apple", "machine=roboter,food=samosa"))

df2 <- data.frame(fruit = c("apple",NA,NA,"apple","NA"),
                 machine = c("crane","crane","roboter",NA,"roboter"),
                 food = c(NA,NA,NA,NA,"samosa"))
Joshua Zecha
  • 141
  • 1
  • 8

1 Answers1

1

We can do an strsplit on the 'val_str' column, create data.frame from the alternate elements (using logical index for subseting via recycling) by looping through the list elements via map

library(dplyr)
library(purrr)
strsplit(as.character(df$val_str), "[=,]") %>% 
       map_df(~  setNames(as.data.frame.list(.[c(FALSE, TRUE)]), .[c(TRUE, FALSE)])) 
#  fruit machine   food
#1 apple   crane   <NA>
#2  <NA>   crane   <NA>
#3  <NA> roboter   <NA>
#4 apple    <NA>   <NA>
#5  <NA> roboter samosa
akrun
  • 874,273
  • 37
  • 540
  • 662
  • While trying this on my real data (which basically looks the same as this one here), I got the following error message: Error in setNames(as.data.frame.list(.[c(FALSE, TRUE)]), .[c(TRUE, FALSE)]) : 'names' attribute [11] must be the same length as the vector [10] ... is there an easy solution to this or would I need to dig deeper into the problem? – Joshua Zecha Oct 26 '17 at 09:51
  • @JoshuaZecha This could happen if you have some strings where there are no words either before the `=` or after the `=` – akrun Oct 26 '17 at 10:03
  • Thank you so much! Is there an easy way to identify these strings and then set the blank space after or before the "=" to NA? – Joshua Zecha Oct 26 '17 at 10:07
  • 1
    @JoshuaZecha You could try something like `gsub("=,|=$", "=NA,", df$val_str) %>% strsplit(., "[=,]") %>% map_df(~ setNames(as.data.frame.list(.[c(FALSE, TRUE)]), .[c(TRUE, FALSE)]))%>% replace(., .=="NA", NA)` (Not tested though) ` – akrun Oct 26 '17 at 10:12