0

I am working on a project and for each observation there is a comment column. Within that column it says how long the person stayed in a certain location. Some comments say "2 nights in A, 2 nights in B." As of right now I am only able to filter out the first number. Is there a way to get both numbers out of the comment? Even if it puts each number pulled into a new row.

lbevs
  • 11
  • 1
  • 3
  • Please add data using `dput` and show the expected output for the same. Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). – Ronak Shah May 07 '20 at 02:41

4 Answers4

2

For a base R option, we can try using grepexpr along with regmatches:

x <- "2 nights in A, 2 nights in B."
y <- regmatches(x, gregexpr("\\b\\d+\\b", x))[[1]]
y

[1] "2" "2"

This would generate a vector containing all numbers in each individual string input.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
2

Tidy way ;) ,

x <- "2 nights in A, 2 nights in B."

library(stringr)
str_extract_all(x, "\\d+")

gives output as

[[1]]
[1] "2" "2"

Edit

str_extract_all(x, "\\d+") %>% unlist

gives output as:

[1] "2" "2"
nikn8
  • 1,016
  • 8
  • 23
  • @Ibevs, output is list similar to **Tim**, which you can further extract using `unlist` or `[[` – nikn8 May 07 '20 at 02:49
2

you could use scan + gsub. Use gsub to delete all non-numeric elements

x <- "2 nights in A, 2 nights in B."
scan(text = gsub("\\D+", " ", x))

Read 2 items
[1] 2 2

of course you can include the quiet parameter. ie scan(text = gsub("\\D", " ", x), quiet = TRUE)

Onyambu
  • 67,392
  • 3
  • 24
  • 53
0

It's not clear from your post if you want to have a separate record for each number you find. If you do, you can have each record that contains numbers build a list, then unnest the list to give you multiple rows.

library(tidyverse)
inputTbl <- tibble(record = 1:2,
       comment = c("3 night in A", "2 nights in A, 1 nights in B."))

inputTbl %>% 
  mutate(numNight = map(comment, ~  unlist(str_extract_all(.x, "\\d+")))) %>% 
  unnest() %>% 
  mutate(numNight = as.double(numNight))

Yields:

# A tibble: 3 x 3
  record comment                       numNight
   <int> <chr>                         <dbl>   
1      1 3 night in A                  3       
2      2 2 nights in A, 1 nights in B. 2       
3      2 2 nights in A, 1 nights in B. 1 

And if you want to capture the hotel, too, you can build a tibble, and unnest it.

inputTbl <- tibble(record = 1:2,
       comment = c("3 night in Sheraton", "2 Nights in Waldorf, 1 night in Sands."))

inputTbl %>% 
  mutate(numNight = 
           map(comment,
                ~  tibble(Nights = unlist(str_extract_all(.x, "\\d+")),
                          Hotel = unlist(str_extract_all(.x, 
                                 "(?ix)                    # Perl-style regex, space+case insensitive
                                  (?<= nights? \\s in \\s) # Detect 'nights in'; dont caputre
                                  (\\w+)                   # The hotel
                                 "))))) %>% 
  unnest()
# A tibble: 3 x 4
  record comment                                Nights Hotel   
   <int> <chr>                                  <chr>  <chr>   
1      1 3 night in Sheraton                    3      Sheraton
2      2 2 Nights in Waldorf, 1 night in Sands. 2      Waldorf 
3      2 2 Nights in Waldorf, 1 night in Sands. 1      Sands   
David T
  • 1,993
  • 10
  • 18