This is an example of the messy data I have to work with. Note the significant month long gaps in the date variable, and multiple sales on the same date.
> print(sold_data, n = 36)
# A tibble: 36 × 2
close_date close_price
<date> <dbl>
1 2021-04-13 820000
2 2021-06-18 780000
3 2021-06-22 995000
4 2021-09-07 905000
5 2021-10-22 1250000
6 2021-12-07 950000
7 2021-12-14 1025000
8 2021-12-21 860000
9 2021-12-30 1040000
10 2022-01-04 1200000
11 2022-01-19 870800
12 2022-01-24 985000
13 2022-02-04 1020000
14 2022-03-09 1504500
15 2022-03-10 1015000
16 2022-04-04 950000
17 2022-05-17 1575000
18 2022-05-18 1070000
19 2022-05-24 1174000
20 2022-05-25 1250000
21 2022-06-10 900000
22 2022-06-10 885000
23 2022-07-01 1200000
24 2022-07-08 850000
25 2022-07-29 1282500
26 2022-08-16 1200000
27 2022-08-22 895000
28 2022-09-02 975000
29 2022-09-16 1330000
30 2022-09-21 1200000
31 2022-09-30 889000
32 2022-11-07 1086365
33 2022-11-22 1232000
34 2022-12-01 1075000
35 2023-02-14 1000000
36 2023-03-13 1153000
This is the code I wrote for a start and end date sequence that is three months apart.
effective_date <- ymd("2023-03-25")
n_month_seq <- 1:24 * months (1)
end_24 <- effective_date - years (2) + n_month_seq
start_24 <- effective_date - years (2) - months(3) + n_month_seq
Then this is the code I wrote for the summary function
n_month_summary <- function(data) {
summarise(data, min_price = min(close_price), max_price = max(close_price),
med_price = median(close_price), count = n())
}
Then this is the simple code to filter the sold_data tibble by each start and end date, then run the summary function.
sum_sold_data <- sold_data %>%
filter(close_date > start_24[1],
close_date <= end_24[1]) %>%
n_month_summary()
This code all works, except only one line at a time. I went through a purrr tutorial on datacamp, but I'm not skilled enough to know how to iterate my filter code through the start and end date lists to get this final result that I want. I am embarrassed to point out I ran the above filter code 24 separate times, then using cbind to create a data frame of min, max, med_price and count. Then using rbind and relocate functions added the start_24 and end_24 date sequences to get the following data frame that is what I want.
> print(sum_sold_data01_24)
start_24 end_24 min_price max_price med_price count
1 2021-01-25 2021-04-25 820000 820000 820000 1
2 2021-02-25 2021-05-25 820000 820000 820000 1
3 2021-03-25 2021-06-25 780000 995000 820000 3
4 2021-04-25 2021-07-25 780000 995000 887500 2
5 2021-05-25 2021-08-25 780000 995000 887500 2
6 2021-06-25 2021-09-25 905000 905000 905000 1
7 2021-07-25 2021-10-25 905000 1250000 1077500 2
8 2021-08-25 2021-11-25 905000 1250000 1077500 2
9 2021-09-25 2021-12-25 860000 1250000 987500 4
10 2021-10-25 2022-01-25 860000 1200000 985000 7
11 2021-11-25 2022-02-25 860000 1200000 1002500 8
12 2021-12-25 2022-03-25 870800 1504500 1020000 7
13 2022-01-25 2022-04-25 950000 1504500 1017500 4
14 2022-02-25 2022-05-25 950000 1575000 1174000 7
15 2022-03-25 2022-06-25 885000 1575000 1070000 7
16 2022-04-25 2022-07-25 850000 1575000 1122000 8
17 2022-05-25 2022-08-25 850000 1282500 900000 7
18 2022-06-25 2022-09-25 850000 1330000 1200000 8
19 2022-07-25 2022-10-25 889000 1330000 1200000 7
20 2022-08-25 2022-11-25 889000 1330000 1143182 6
21 2022-09-25 2022-12-25 889000 1232000 1080682 4
22 2022-10-25 2023-01-25 1075000 1232000 1086365 3
23 2022-11-25 2023-02-25 1000000 1075000 1037500 2
24 2022-12-25 2023-03-25 1000000 1153000 1076500 2
I am looking for help to iterate the filter and summary function over the two start and end date lists/vectors to get a data frame with min, max, med_dates and count. If the map() function in purrr would also add the start and end dates as well that would be terrific but not absolutely necessary, I know how to add those after the fact.
I have tried unsuccessfully to use slider, that does get close, but the sliding window collapses at every point there is no monthly data, then begins building the three month window anew and effectively ignores the available data in the surrounding months that do have data. Again my skills with slider are limited and although I have read all the instructions I am at a loss to get what I want using slider, or runner for that matter.
I have included the data I am using, the code I have written and an example of what I need. I would respectfully appreciate any help getting from my raw sold_data to the final sum_sold_data01_24 dataframe using purrr and one of the map() functions or in any other way.
Any other solution needs to allow the gap between the start and end dates to be user controlled, accommodate missing data, duplicate data, and allow a custom summary function to be run on each window of data. Many thanks.