0

I'm trying to gather weight values from one table (myChickWts) that were collected in the week prior to each blood sample recorded in another table (chickblood). I want to get a list of blood dates and the associated weights from the week leading up to each blood sample. I've tried several different ways, and I keep getting dates after the blood sample date included in my results.

In this example, the match returned dates that were both before (1/9, 1/11, 1/13) and after (1/15) the blood date. How can I match these two tables? I also tried difference_join, but it returned results 7 days before and 7 days after my other result - again, not what I'm looking for.

Chick   Date.x (blood)  Date.y (weight)  Chick.y  Weight.y
10     2019-01-14       2019-01-09       10       74
10     2019-01-14       2019-01-11       10       81
10     2019-01-14       2019-01-13       10       89
10     2019-01-14       2019-01-15       10       96




library(tidyverse)
library(lubridate)
library(fuzzyjoin)

Import data (sample data for reprex)

mychickwts <- datasets::ChickWeight %>%
  mutate(Date = date("2019-01-01") + Time) %>%
  select(Date, Chick, weight) %>%
  filter(Chick <= 10)

chickblood <- data.frame(
       Chick = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7,
                 8, 8, 8, 9, 9, 9, 10, 10, 10),
        Date = date(c("2019-01-01", "2019-01-12", "2019-01-22", "2019-01-06",
                 "2019-01-15", "2019-01-22", "2019-01-05", "2019-01-07",
                 "2019-01-14", "2019-01-03", "2019-01-08", "2019-01-11", "2019-01-02",
                 "2019-01-20", "2019-01-23", "2019-01-12", "2019-01-16",
                 "2019-01-18", "2019-01-10", "2019-01-10", "2019-01-22", "2019-01-03",
                 "2019-01-04", "2019-01-08", "2019-01-06", "2019-01-14",
                 "2019-01-17", "2019-01-02", "2019-01-14", "2019-01-21")))

Determine if weight date is in the week prior to bloodwork date.

compare <- function(a, b) {
  (a - b) <= 7
}

Get a table of each blood date and all the matching body weights within the past 7 days. This is not working.

chickblood %>%
fuzzy_left_join(
  mychickwts,
  by = c(
    "Chick" = "Chick",
    "Date" = "Date"
    ),
  match_fun = list(`==`, `compare`)
  )

I also tried with difference_join, but in this case I can't seem to figure out how to get it to match by chick and it returns both before and after dates.

   chickblood %>%
    difference_join(mychickwts, by = "Date",
       max_dist = 7
      )

I've tried using %within% from lubridate with no luck. This returns an error, and I'm not sure why exactly.

chickblood %>%
fuzzy_left_join(
  mychickwts,
  by = c("Chick" = "Chick", 
         "Date" = "Date"),
  match_fun = list("==", "%within%")
  ) %>%
  arrange(Date.x)

Error in which(m) : argument to 'which' is not logical
datakritter
  • 590
  • 5
  • 19

1 Answers1

0

As the data set is not too large you can just do a normal left join on 'Chick', and then determine if the weight date is in the week prior to the blood work date. From there you can just keep the rows you want.

library(tidyverse)
library(lubridate)
library(fuzzyjoin)

mychickwts$Chick <- as.numeric(mychickwts$Chick)

chickblood %>% 
  left_join(mychickwts, by = "Chick", suffix = c(".blood", ".wt")) %>% 
  mutate(wt_days_prior = Date.blood - Date.wt) %>% 
  mutate(wt_in_week_prior = wt_days_prior <= 7 & wt_days_prior >= 0) %>% 
  filter(wt_in_week_prior)

Alternatively, if you want to do it in a single join, something like this might work.

chickblood %>% 
  fuzzy_left_join(mychickwts, by = c("Chick", "Date"),
                  match_fun = list(`==`, function(x, y) x - y >= 0 & x - y <= 7)
  )

Replace fuzzy_left_join with fuzzy_inner_join to keep only the bloodwork measurements which have weight dates in the week prior.

Callum Savage
  • 341
  • 2
  • 7
  • Well, my real dataset has about 8 million records, but both of your methods work! I will subset my data and do some speed testing before going down one or the other path. Thanks! – datakritter Nov 30 '19 at 12:37