0

I have the following code:

data <- data_frame(job_id = c("114124", "114188", "114206"), project_skills = c("WordPress,XTCommerce,Magento,Prestashop,VirtueMart,osCommerce", "HTML,SEO,WordPress,SEO Texte", "Illustrator,Graphic Design,Photoshop"))

which creates the following data frame:

job_id    project_skills
114124    WordPress,XTCommerce,Magento,Prestashop,VirtueMart,osCommerce
114188    HTML,SEO,WordPress,SEO Texte
114206    Illustrator,Graphic Design,Photoshop

I need to split the strings (at the comma) from the project_skills column as follows:

job_id    project_skills
114124    [WordPress] [XTCommerce] [Magento] [Prestashop] [VirtueMart] [osCommerce]
114188    [HTML] [SEO] [WordPress] [SEO Texte]
114206    [Illustrator] [Graphic Design] [Photoshop]

As a result i´d like to have a data frame with the splitted phrases as rows which should be vectors such that i can iterate through them. Has anyone an idea how i can establish this? Thank´s in advance!

Sotos
  • 51,121
  • 6
  • 32
  • 66
Tobias Mini
  • 380
  • 2
  • 12
  • 1
    Do you simply need this `data$project_skills <- strsplit(data$project_skills, ',')`? – Sotos Jan 04 '19 at 14:12
  • 1
    If you want to iterate through them why not convert it into long format instead ?`library(tidyverse); data %>% separate_rows(project_skills)` ? It would be much easier and convenient to deal with it that way. – Ronak Shah Jan 04 '19 at 14:13
  • I need to sustain the data frame such that i can address each element per row separately e.g., data$project_skills[1][[1]] (or so) should return "WordPress" and so on – Tobias Mini Jan 04 '19 at 14:21
  • 1
    So what I suggested will do it yes? (only asking so I can dupe it If that is what you need) – Sotos Jan 04 '19 at 14:27
  • How can i access the specific Elements in your solution (e.g., Wordpress from the job_id 114124)? – Tobias Mini Jan 04 '19 at 14:33
  • It depends on what you want to do but that is a different question – Sotos Jan 04 '19 at 14:43

1 Answers1

1

like this?

l <- strsplit( data$project_skills, ",")
names(l) <- data$job_id
l
# $`114124`
# [1] "WordPress"  "XTCommerce" "Magento"    "Prestashop" "VirtueMart" "osCommerce"
# 
# $`114188`
# [1] "HTML"      "SEO"       "WordPress" "SEO Texte"
# 
# $`114206`
# [1] "Illustrator"    "Graphic Design" "Photoshop"  

different angle using data.table

library( data.table )
dt <- as.data.table( data )
#determine maximum number of skills
skillmax <- max( lengths( strsplit( dt$project_skills,",")))
#create data.table
dt[, paste0( "skill", 1:skillmax ) := tstrsplit( project_skills, ",", fill = NA)][]

#    job_id                                                project_skills      skill1         skill2    skill3
# 1: 114124 WordPress,XTCommerce,Magento,Prestashop,VirtueMart,osCommerce   WordPress     XTCommerce   Magento
# 2: 114188                                  HTML,SEO,WordPress,SEO Texte        HTML            SEO WordPress
# 3: 114206                          Illustrator,Graphic Design,Photoshop Illustrator Graphic Design Photoshop

# skill4     skill5     skill6
# 1: Prestashop VirtueMart osCommerce
# 2:  SEO Texte       <NA>       <NA>
# 3:       <NA>       <NA>       <NA>
Wimpel
  • 26,031
  • 1
  • 20
  • 37