0

I have some 13 column and 20,000 rows of data. In which one column is having description details. Description column has one or many values separated using delimiters like ",", "-", "/", "&". Need to split the Description column into multiple columns based on different delimiters used. Some values could be just one word and doesn't need a split. Please help me on this.

Here's my piece of code that I tried. But its not working.

df=as.data.frame(openxlsx::read.xlsx("Sample data.xlsx",sheet=1,colNames=TRUE,skipEmptyRows=TRUE))

newdf <- data.frame()
for (i in 1:nrow(df)) 
  {
  out <- strsplit(as.character(df$SHORT_DESC[i]),',')
  newdf <- rbind(newdf, data.frame(t(sapply(out, `[`))))
}

After this I planned to merge the original dataframe (excluding description) and new dataframe (with description separated in multiple columns), column wise. But my above code fails, as well I wanted to perform this operation in few lines of codes.

INPUT & OUTPUT:

NO   SHORT_DESC                     
1    SEAT, HELICAL, COMPRESSION SPRING          
2    RIVET, SOLID, 100 DEGREE COUNTERSUNK HEA   
3    WIRE,NICKEL-COPPER,BULK                    
4    INDUCTOR/ EMI SUPPRESSION                  
5    CHOKE                                      
6    NOM PLATE & LENS                           


No   SHORT_DESC1   SHORT_DESC2    SHORT_DESC3                SHORT_DESC4    
1    SEAT          HELICAL        COMPRESSION SPRING
2    RIVET         SOLID          100 DEGREE COUNTERSUNK HEA
3    WIRE          NICKEL         COPPER                     BULK
4    INDUCTOR      EMI SUPPRESSION
5    CHOKE
6    NOM PLATE     LENS
Arun Laksh
  • 25
  • 4
  • 1
    Images are not the right way to share data/code. Add them in a reproducible format which is easier to copy. Read about [how to give a reproducible example](http://stackoverflow.com/questions/5963269). – Ronak Shah Jun 18 '21 at 11:20

1 Answers1

0

You could do

your_data %>%
  tidyr::separate(SHORT_DESC, sep = "[,/] *",
                  into = paste0("SHORT_DESC", 1:999)) %>%
  janitor::remove_empty()

which returns

# A tibble: 6 x 4
     NO SHORT_DESC1      SHORT_DESC2     SHORT_DESC3               
  <dbl> <chr>            <chr>           <chr>                     
1     1 SEAT             HELICAL         COMPRESSION SPRING        
2     2 RIVET            SOLID           100 DEGREE COUNTERSUNK HEA
3     3 WIRE             NICKEL-COPPER   BULK                      
4     4 INDUCTOR         EMI SUPPRESSION NA                        
5     5 CHOKE            NA              NA                        
6     6 NOM PLATE & LENS NA              NA                        

Data used:

your_data <- tibble::tribble(~NO, ~SHORT_DESC, 1, "SEAT, HELICAL, COMPRESSION SPRING", 2, "RIVET, SOLID, 100 DEGREE COUNTERSUNK HEA", 3, "WIRE,NICKEL-COPPER,BULK", 4, "INDUCTOR/ EMI SUPPRESSION", 5, "CHOKE", 6, "NOM PLATE & LENS")
ktiu
  • 2,606
  • 6
  • 20
  • It worked well, thanks a ton. Hope I can add space into , sep = "[,/] *", like sep = "[,/ ] *", to have space as a separator. Am I right? – Arun Laksh Jun 21 '21 at 06:52