I have a data frame with residential sales dates and prices that ends at a most recent effective date and begins at least 36 months earlier. I need to arrange in either a sliding window of width three months or a lubridate time interval of three months. The three month window/interval then progresses forward to the effective date in one month increments.
The data will be irregularly spaced with gaps in the date (index) variable.
The data has multiple price (value) observations with the same date (index.)
I need to summarize each three month window/interval with the begin date, end date, minimum price, maximum price, median price and a count of price observations in each window interval as it progresses forward in time.
I need to control the breaks by a specified date, i.e. the 11th, or 17th or any other day of the month, all of the beginning and ending dates of the sliding window/interval need to fall on the specific day of the month the effective date falls on.
I need to adjust the width of the sliding window/interval from one month up to six months.
Slider appears to include not to drop multiple observations with the same date (index) my sincere apologies, I did not double check this.
The only other limitation is the inability to control the breaks by date, all of the time series packages default to calendar weeks, months, quarters or years. Runner has an "at" argument that I can provide a date sequence for the breaks. The clock package can create a date sequence keyed to a specific effective date, and has an elegant way of dealing with dates at the end of the month. Unfortunately because of my limited knowledge I am unable to get the runner package to provide anything other than a vector of median values using the "at" argument. Without at least the end date associated with the median value I'm unclear how I can plot the median value change over time.
I worked with code from an earlier SO answer.
https://stackoverflow.com/questions/63481359/using-runner-package-to-summarise-groups
I though this was going to work for me but I was unable to get close to my ideal answer. Working with examples of code from runner and slider, my best effort is this slider solution.
library(tidyverse)
library(lubridate)
library(runner)
library(clock)
library(slider)
# Stack Overflow examples ---------------------
startDate <- ymd("2018-01-01")
endDate <- ymd("2020-01-01")
# Create data
set.seed(31)
soldData <- tibble(
price = round(rnorm(100, mean=500000, sd=100000),-3),
date = sample(seq.Date(startDate,endDate,by="days"),100,replace=T))
soldData <- soldData %>%
arrange(date)
# using slider with soldData and my function --------------------
n_day_slider_summ <- function(data) {
summarise(data, start = min(date), end = max(date), min_price = min(price),
max_price = max(price), med_price = median(price), count = n())
}
n_month_summ <- slide_period_dfr(
soldData,
soldData$date,
"month",
n_day_slider_summ,
.before = 2
)
head(n_month_summ, n = 10L)
This is the output I get. This is really really close to what I need, notice how the begin and end dates are relative to the data in the sliding window. However what I would prefer is instead of the sliding window starting at the beginning of a month and progressing through to the end of the third month, what I am looking for is for the window to begin and end by day date and progress by day date rather than defaulting to the entire calendar month.
I would really appreciate help with the ability to control the breaks by a specified date. Thank you, and again my apologies for not realizing slider is giving me a solution much closer to my ideal than initially realized.
# A tibble: 10 × 6
start end min_price max_price med_price count
<date> <date> <dbl> <dbl> <dbl> <int>
1 2018-01-03 2018-01-30 387000 728000 584000 6
2 2018-01-03 2018-02-28 373000 728000 498500 12
3 2018-01-03 2018-03-31 363000 728000 505500 20
4 2018-02-04 2018-04-30 363000 705000 491000 16
5 2018-03-05 2018-05-28 363000 705000 556000 15
6 2018-04-19 2018-06-22 431000 734000 559000 10
7 2018-05-06 2018-07-12 246000 734000 562000 11
8 2018-06-19 2018-08-15 246000 734000 503000 8
9 2018-07-01 2018-09-23 246000 596000 502000 8
10 2018-08-01 2018-10-30 335000 636000 502000 8