Find and extract text between delimiters R

Question

I have the following data string

    Seat_WASHER<-
  structure(
    list(
      Description = c(
        "SEAT WASHER, MR2, 8\", TN 10.12, CR 150/600, 316 Stainless Steel",
        "SEAT WASHER, 1\", TN 1.42, CR 950/1200, MR1, 316 Stainless Steel",
        "SEAT WASHER, 3\", TN 1.52,  MR1, 316 Stainless Steel",
        "SEAT WASHER, MR1, 2\", TN 1.62, CR 800/1200, 316 Stainless Steel",
        "SEAT WASHER, MR1, TN 2.12, 1/2\", CR 150/600, 316 Stainless Steel",
        "SEAT WASHER, MR6, 2\", TN 6.48, CR 750/100, 316 Stainless Steel"
      )
    ),
    row.names = c(NA,-7L),
    class = c("tbl_df", "tbl", "data.frame")
  )

It's a very large data set and is not consistent in it's order or contents with strings.

How do I find key indicators (", CR, MR), and pull all data between the delimiters into a column? If it can't find the key indicator in the string it'll need to output NULL.

Finding all CR will result in a column like:

Col 1 
--------
CR 150/600
CR 950/1200
NULL
CR 800/1200
CR 150/600
CR 750/100

I want all of the ones with the same key indicators in the same columns. For example, |col1 |col2 |col3 |col4 |MR1 |2" |TN 6.48 |CR 750/1000 |'' |'' |'' |'' — DShad33, Aug 02 '22 at 19:00
Would you `dput(data_string)` and edit your question and paste the result? — Mohamed Desouky, Aug 02 '22 at 19:14
Yeah, let me look up how to do that. I'm new to this platform so thank you for you patience. — DShad33, Aug 02 '22 at 19:19
assuming you `my_horror_data <- read.csv(` , `dput(head(my_horror_data))`, then copy `structure(...)` as data above. — Chris, Aug 02 '22 at 19:26

Mohamed Desouky · Accepted Answer · 2022-08-02T19:50:26.147

1

You can try

library(stringr)

Seat_WASHER$col1 <- str_extract(Seat_WASHER$Description , "CR \\d+/\\d+")

output

         col1
1  CR 150/600
2 CR 950/1200
3        <NA>
4 CR 800/1200
5  CR 150/600
6  CR 750/100

edited Aug 02 '22 at 19:50

answered Aug 02 '22 at 19:26

Mohamed Desouky

4,340
2
4
19

If you want to add another columns , just change the pattern in the function `str_extract` like `Seat_WASHER$col2 <- str_extract(Seat_WASHER$Description , "TN \\d+\\.\\d+") ` to extract all TN results – Mohamed Desouky Aug 02 '22 at 20:05

Mike · Answer 2 · 2022-08-02T20:04:47.737

If it is always split by a comma you can use strsplit to separate the string then find where CR is located using grep(), specify value = TRUE to return the value. I added trimws to remove the leading space.

m1 <- "SEAT WASHER, MR6, 2\", TN 6.48, CR 750/100, 316 Stainless Steel"
m2 <- strsplit(m1,",") 
trimws(grep("CR",m2[[1]], value = TRUE))

edit based on data

Still will string split and then keep where CR is inm3 before appending to data turn all length 0 vectors to NA

m2 <-   strsplit(Seat_WASHER$Description,",") 
m3 <- sapply(m2, function(x) trimws(grep("CR",x, value = TRUE)))

Seat_WASHER$newcol <- sapply(m3, function(x) if(identical(x, character(0))) NA_character_ else x)

Find and extract text between delimiters R

2 Answers2

edit based on data