I am working on a project and for each observation there is a comment column. Within that column it says how long the person stayed in a certain location. Some comments say "2 nights in A, 2 nights in B." As of right now I am only able to filter out the first number. Is there a way to get both numbers out of the comment? Even if it puts each number pulled into a new row.
Asked
Active
Viewed 57 times
0
-
Please add data using `dput` and show the expected output for the same. Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). – Ronak Shah May 07 '20 at 02:41
4 Answers
2
For a base R option, we can try using grepexpr
along with regmatches
:
x <- "2 nights in A, 2 nights in B."
y <- regmatches(x, gregexpr("\\b\\d+\\b", x))[[1]]
y
[1] "2" "2"
This would generate a vector containing all numbers in each individual string input.

Tim Biegeleisen
- 502,043
- 27
- 286
- 360
2
Tidy way ;) ,
x <- "2 nights in A, 2 nights in B."
library(stringr)
str_extract_all(x, "\\d+")
gives output as
[[1]]
[1] "2" "2"
Edit
str_extract_all(x, "\\d+") %>% unlist
gives output as:
[1] "2" "2"

nikn8
- 1,016
- 8
- 23
-
@Ibevs, output is list similar to **Tim**, which you can further extract using `unlist` or `[[` – nikn8 May 07 '20 at 02:49
2
you could use scan
+ gsub
. Use gsub
to delete all non-numeric elements
x <- "2 nights in A, 2 nights in B."
scan(text = gsub("\\D+", " ", x))
Read 2 items
[1] 2 2
of course you can include the quiet
parameter. ie scan(text = gsub("\\D", " ", x), quiet = TRUE)

Onyambu
- 67,392
- 3
- 24
- 53
0
It's not clear from your post if you want to have a separate record for each number you find. If you do, you can have each record that contains numbers build a list, then unnest
the list to give you multiple rows.
library(tidyverse)
inputTbl <- tibble(record = 1:2,
comment = c("3 night in A", "2 nights in A, 1 nights in B."))
inputTbl %>%
mutate(numNight = map(comment, ~ unlist(str_extract_all(.x, "\\d+")))) %>%
unnest() %>%
mutate(numNight = as.double(numNight))
Yields:
# A tibble: 3 x 3
record comment numNight
<int> <chr> <dbl>
1 1 3 night in A 3
2 2 2 nights in A, 1 nights in B. 2
3 2 2 nights in A, 1 nights in B. 1
And if you want to capture the hotel, too, you can build a tibble, and unnest it.
inputTbl <- tibble(record = 1:2,
comment = c("3 night in Sheraton", "2 Nights in Waldorf, 1 night in Sands."))
inputTbl %>%
mutate(numNight =
map(comment,
~ tibble(Nights = unlist(str_extract_all(.x, "\\d+")),
Hotel = unlist(str_extract_all(.x,
"(?ix) # Perl-style regex, space+case insensitive
(?<= nights? \\s in \\s) # Detect 'nights in'; dont caputre
(\\w+) # The hotel
"))))) %>%
unnest()
# A tibble: 3 x 4
record comment Nights Hotel
<int> <chr> <chr> <chr>
1 1 3 night in Sheraton 3 Sheraton
2 2 2 Nights in Waldorf, 1 night in Sands. 2 Waldorf
3 2 2 Nights in Waldorf, 1 night in Sands. 1 Sands

David T
- 1,993
- 10
- 18