1

I have data which represents age, it is given as for example 8y 10m 27d, where y are years, m are months, and d are days.

I've tried using gsub() to replace the y, m and d with *365+, *30+ and nothing respectively, and then using as.numeric(), but R doesn't know that it should calculate first so it just returns NA.

Is there a way to convert that kind of string to the exact number of days?

Sorry for the formatting, I can't remember the last time I was on this site so I forgot how to format.

Jaap
  • 81,064
  • 34
  • 182
  • 193
implicati0n
  • 269
  • 2
  • 10
  • 1
    You can have a look at the lubridate package e.g. http://stackoverflow.com/questions/3765668/have-lubridate-subtraction-return-only-a-numeric-value – Ilias Pan Sep 16 '16 at 10:33

2 Answers2

1

We can use gsubfn to replace the 'y', 'm', 'd' with the "* 365 +", "* 30 + " and "* 1" and evaluate the string with eval(parse(.

library(gsubfn)
eval(parse(text=gsubfn("[a-z]", list(y= "* 365 + ", m = "* 30 + ", d = "* 1"), str1)))
#[1] 3247

Or a faster option would be

c(matrix(scan(text=sub(",$", "", gsub("\\D+", ",", str2)), sep=",",
    what=numeric(), quiet=TRUE), ncol=3, byrow=TRUE) %*% c(365, 30, 1))
#[1] 3247 3247

Update

If there are different patterns in the dataset, we can try

str3 <- c(str1, "7m 28d", "5y 10d", "15d", "29d", "8y 15d 10m" )
colSums(sapply(strsplit(str3, "\\s+"), function(x) {
          x1 <- as.numeric(sub("\\D+", "", x))
           x2 <- sub("\\d+", "", x)
         x1[match(c("y", "m", "d"), x2)]}) * c(365, 30, 1), na.rm = TRUE)
#[1] 3247  238 1835   15   29 3235

data

str1 <-  "8y 10m 27d"
str2 <- c(str1, str1)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • And how would I modify this to work on an entire data set? Just use a for loop or? – implicati0n Sep 16 '16 at 10:30
  • @implicati0n You can use `sapply(str1, function(x) eval(parse(text=x)))` – akrun Sep 16 '16 at 10:31
  • I get an error "Error in parse ... unexpected end of input". @akrun – implicati0n Sep 16 '16 at 10:39
  • @implicati0n I tried with `str2 <- c(str1, str1); unname(sapply(gsubfn("[a-z]", list(y= "* 365 + ", m = "* 30 + ", d = "* 1"), str2), function(x) eval(parse(text=x)))) #[1] 3247 3247` – akrun Sep 16 '16 at 10:40
  • @implicati0n Are you saying the error using the same `str2` or your original dataset. Please check if your dataset have different patterns etc. – akrun Sep 16 '16 at 10:45
  • @implicati0n I also added another method. – akrun Sep 16 '16 at 10:47
  • Sorry, the error occurs when I work with my dataset. Unfortunately, it does have different patterns, some data are missing "m" or "d", however, the error occurs on a data which is of the form with all three letters. @akrun – implicati0n Sep 16 '16 at 10:47
  • @implicati0n Sorry, this is based on what you showed in the post. – akrun Sep 16 '16 at 10:48
  • I know, it was my mistake, I forgot to mention that. – implicati0n Sep 16 '16 at 10:49
  • @implicati0n Updated with another option for different patterns. – akrun Sep 16 '16 at 14:31
  • Using the command from the 4th comment, I get what I needed when I remove the data which isn't of the needed form. However, how can I assign this to a new variable? Because when I try to print str2, I get the unchanged version of it. – implicati0n Sep 17 '16 at 00:18
  • @implicati0n You need to assign i.e. `res <- colSums(sapply(strsplit(str3, "\\s+"), function(x) { x1 <- as.numeric(sub("\\D+", "", x)) x2 <- sub("\\d+", "", x) x1[match(c("y", "m", "d"), x2)]}) * c(365, 30, 1), na.rm = TRUE)` – akrun Sep 17 '16 at 00:21
  • Error: unexpected symbol in "res <- colSums(sapply(strsplit(baza$Age, "\\s+"), function(x) { x1 <- as.numeric(sub("\\D+", "", x)) x2" Sorry, but I'm new to R so I'm really struggling. – implicati0n Sep 17 '16 at 00:26
  • @implicati0n Sorry, I was copying from the code. You have to use the `;` separator for each line. i.e. `{ x1 <- as.numeric(sub("\\D+", "", x)); x2 <- sub("\\d+", "", x);x1[match(c("y", "m", "d"), x2)]}) * c(365, 30, 1), na.rm = TRUE)` – akrun Sep 17 '16 at 00:27
0

The solution can depend on the origin date (because of leap years)

A solution can be:

str="8y 10m 27d"
str2=gsub("[A-z]","",str)
str3=as.numeric(strsplit(str2, " ")[[1]])
date1=origin=as.POSIXlt("1990-01-01")
date1$year=date1$year+str3[1]
date1$mon=date1$mon+str3[2]
date1$mday=date1$mday+str3[3]
date1-origin
#[1] Time difference of 3253 days
user3507085
  • 700
  • 5
  • 17