0

I have a data frame with residential sales dates and prices that ends at a most recent effective date and begins at least 36 months earlier. I need to arrange in either a sliding window of width three months or a lubridate time interval of three months. The three month window/interval then progresses forward to the effective date in one month increments.

The data will be irregularly spaced with gaps in the date (index) variable.

The data has multiple price (value) observations with the same date (index.)

I need to summarize each three month window/interval with the begin date, end date, minimum price, maximum price, median price and a count of price observations in each window interval as it progresses forward in time.

I need to control the breaks by a specified date, i.e. the 11th, or 17th or any other day of the month, all of the beginning and ending dates of the sliding window/interval need to fall on the specific day of the month the effective date falls on.

I need to adjust the width of the sliding window/interval from one month up to six months.

Slider appears to include not to drop multiple observations with the same date (index) my sincere apologies, I did not double check this.

The only other limitation is the inability to control the breaks by date, all of the time series packages default to calendar weeks, months, quarters or years. Runner has an "at" argument that I can provide a date sequence for the breaks. The clock package can create a date sequence keyed to a specific effective date, and has an elegant way of dealing with dates at the end of the month. Unfortunately because of my limited knowledge I am unable to get the runner package to provide anything other than a vector of median values using the "at" argument. Without at least the end date associated with the median value I'm unclear how I can plot the median value change over time.

I worked with code from an earlier SO answer.

https://stackoverflow.com/questions/63481359/using-runner-package-to-summarise-groups

I though this was going to work for me but I was unable to get close to my ideal answer. Working with examples of code from runner and slider, my best effort is this slider solution.

library(tidyverse)
library(lubridate)
library(runner)
library(clock)
library(slider)

# Stack Overflow examples ---------------------

startDate <- ymd("2018-01-01")
endDate <- ymd("2020-01-01")

# Create data
set.seed(31)
soldData <- tibble(
  price = round(rnorm(100, mean=500000, sd=100000),-3),
  date = sample(seq.Date(startDate,endDate,by="days"),100,replace=T))

soldData <- soldData %>%
  arrange(date)

# using slider with soldData and my function --------------------

n_day_slider_summ <- function(data) {
  summarise(data, start = min(date), end = max(date), min_price = min(price), 
            max_price = max(price), med_price = median(price), count = n())
}

n_month_summ <- slide_period_dfr(
  soldData,
  soldData$date,
  "month",
  n_day_slider_summ,
  .before = 2
)

head(n_month_summ, n = 10L)

This is the output I get. This is really really close to what I need, notice how the begin and end dates are relative to the data in the sliding window. However what I would prefer is instead of the sliding window starting at the beginning of a month and progressing through to the end of the third month, what I am looking for is for the window to begin and end by day date and progress by day date rather than defaulting to the entire calendar month.

I would really appreciate help with the ability to control the breaks by a specified date. Thank you, and again my apologies for not realizing slider is giving me a solution much closer to my ideal than initially realized.

# A tibble: 10 × 6
   start      end        min_price max_price med_price count
   <date>     <date>         <dbl>     <dbl>     <dbl> <int>
 1 2018-01-03 2018-01-30    387000    728000    584000     6
 2 2018-01-03 2018-02-28    373000    728000    498500    12
 3 2018-01-03 2018-03-31    363000    728000    505500    20
 4 2018-02-04 2018-04-30    363000    705000    491000    16
 5 2018-03-05 2018-05-28    363000    705000    556000    15
 6 2018-04-19 2018-06-22    431000    734000    559000    10
 7 2018-05-06 2018-07-12    246000    734000    562000    11
 8 2018-06-19 2018-08-15    246000    734000    503000     8
 9 2018-07-01 2018-09-23    246000    596000    502000     8
10 2018-08-01 2018-10-30    335000    636000    502000     8
  • Minor comment: it would be helpful if you added a `set.seed()` before creating the data so it's reproducible. – jtr13 Mar 11 '23 at 14:38
  • I added a set.seed() argument and edited some commentary text, thank you – Joseph Harvard Mar 11 '23 at 15:33
  • It would be more helpful if you show what you want to get rather than showing what you don't want to get. It is not true that rollapply can only accommodate one value and your input only has one date column so it is unclear what the problem is regarding two input dates. – G. Grothendieck Mar 11 '23 at 16:48
  • @G. Grothendieck, my apologies, you are correct , the duplicate date observations on 2018-06-22, 2018-10-30 etc are included in the slider output. I still need the window to begin, end and progress on the same day date relative to an effective date. The sample date data ends on 2020-01-01, I would need the breaks to be relative to a date outside this range like 2020-01-15. So the first window would be 2018-01-15 to 2018-02-15, second 2018-01-15 to 2018-03-15 third 2018-02-15 to 2018-04-15 etc etc. Hopefully that clears up my error, thank you again Joe – Joseph Harvard Mar 12 '23 at 05:51

0 Answers0