0

This question is related to the recent question I have posted here

I want to compute exposure averages of the preceding years 1, 2, 3. etc. based on the exact date of entry into the study.

Example for the first person (entered study in Feb 2002) the one year average is computed from the number of days in 2002(37) and 2001(328). Exposure for 2002 was 18 and for 2001 it was 19. This gives a one year exposure of 18.88569 which is calculated from formula (37*18)/365.25 and (328*19)/365.25.

A two year average for the same person includes exposure data from years 2002, 2001 and 2000 and will be: contribution of 2002 (37*18) / 365.25
contribution of 2001: 19
contribution of 2000: (328*18)/365.25 ) / 2 which gives 18.49384

Thanks Fake data is found following this link: https://drive.google.com/file/d/0B_4NdfcEvU7La1ZCd2EtbEdaeGs/view?usp=sharing

Community
  • 1
  • 1
Meso
  • 1,375
  • 5
  • 17
  • 36

1 Answers1

1

Once again, here is one idea. There will be better approaches. Anyway, I modified my previous answer for your previous question. This is not an elegant answer either. But, this does what you are after. The first step was to rearrange your data frame. test in ana has the number of the days for each subject's (id) entry year (e.g., 37 for id 1) as well as 365 for the other years. In the second step, I split the data by subject (id) and removed rows which years are larger than the entry year. In the final step, I did math as you described in your question. I chose the entry year row for each subject and created a data frame.

library(reshape2)
library(lubridate)
library(stringi)
library(dplyr)
library(tidyr)

### Arrange the data frame.
ana <- mutate(mydf,id = 1:n()) %>%
       melt(id.vars = c("id","entry")) %>%
       mutate(variable = stri_extract_last(variable, regex = "\\d+"),
              entry = as.Date(entry, format = "%d%b%Y"),
              entryYear = as.character(format(entry, "%Y")),
              check = ifelse(variable == entryYear, "Y", "N"),
              test = ifelse(variable == entryYear, yday(entry) - 1, 365)) %>%
              arrange(id)


### Get rid of rows which have larger year numbers than the entry year

bob <- lapply(split(ana, ana$id), function(x) {

                    indx <- which(x$check == "Y")

                    if(indx < nrow(x)){

                        x[-c(indx + 1: nrow(x)),]

                    }else{x}
                }
            )


### Get one-year and two-year averages

cathy <- lapply(bob, function(x){

    x$one <- ((x[nrow(x),4]) * (x[nrow(x),7]) / 365.25) +
             ((x[nrow(x)-1,4]) * ((x[nrow(x)-1,7]) - (x[nrow(x),7])) / 365.25)


    x$two <- (((x[nrow(x),4]) * (x[nrow(x),7]) / 365.25) +
             (x[nrow(x)-1,4]) +
             ((x[nrow(x)-2,4]) * ((x[nrow(x)-2,7]) - (x[nrow(x),7])) / 365.25)) / 2 

    subset(x, check == "Y")
})

### Create a data frame. unnest is available in the dev version of tidyr.
unnest(cathy)

#  id      entry variable value entryYear check test       one       two
#1  1 2002-02-07     2002    18      2002     Y   37 18.885695 18.493840
#2  2 2002-06-06     2002    16      2002     Y  156 16.561259 16.780630
#3  3 2003-04-16     2003    14      2003     Y  105 15.414100 15.707050
#4  4 2003-05-26     2003    16      2003     Y  145 16.591376 17.096851
#5  5 2003-06-11     2003    13      2003     Y  161 13.549624 14.054073
#6  6 2004-02-20     2004     3      2004     Y   50  2.997947  3.430185
#7  7 2004-07-25     2004     2      2004     Y  206  1.998631  2.216975
#8  8 2004-08-19     2004     4      2004     Y  231  3.997262  4.182067
#9  9 2004-12-19     2004     5      2004     Y  353  5.029432  5.481862
jazzurro
  • 23,179
  • 35
  • 66
  • 76
  • the code does nto work for me. I get error message that which says "Error in if (indx < nrow(x)) { : argument is of length zero" – Meso Dec 01 '14 at 20:05
  • @Meso I forgot adding `library(stringi)`. I wonder if this is the cause. – jazzurro Dec 02 '14 at 00:39
  • @jazzuro, thanks again. The code works for the sample data. However still gives me on my real data that is a longer list than that produced by "ana" and bob. When I run the same error message as before. – Meso Dec 02 '14 at 08:19
  • @Meso I see. I'd like to clarify one thing. ana is a data.frame and bob is a list object. Where do you get stuck? Can you create `ana` with your real data? How about `bob`? It is hard to see what is happening, but I may be able to help you out. Let me know. P.S. Could you change `x[-c(indx + 1, nrow(x)),]` in `bob` to `x[-c(indx + 1: nrow(x)),]` and see what happens? – jazzurro Dec 02 '14 at 08:28
  • the problem is on creation of bob. Ana is created. Changing the code in bob did not help (same error message) – Meso Dec 02 '14 at 13:16
  • @Meso Thanks for that. Is there any way you can provide another example? – jazzurro Dec 02 '14 at 13:19
  • I can send you sample data on your private mail – Meso Dec 02 '14 at 13:21