I am attempting to inner_join two data frames, each with three columns. The first data frame contains date, variable names, and forecast values while the second data frame contains date, variable names, and actual values. The purpose of my join is to match forecast values to actual values by date and the correct weather variable name for accuracy analysis.
My first data frame x is as follows
x <- structure(list(Date = structure(c(1588060800, 1588060800, 1588060800,
1588060800, 1588060800, 1588060800, 1588060800, 1588060800, 1588060800,
1588060800, 1588060800, 1588060800), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), wx_vars = c("Wx1_Temperature", "Wx1_Precipitation",
"Wx1_CloudCover", "Wx1_DewPoint", "Wx1_WindSpeed", "Wx1_SolarRadiation",
"Wx2_Temperature", "Wx2_Precipitation", "Wx2_CloudCover", "Wx2_DewPoint",
"Wx2_WindSpeed", "Wx2_SolarRadiation"), wx_forecast = c(56.92,
0.0046, 77.46, 50.26, 7.42, 12.93, 57.05, 0.0037, 68.3, 50.5,
7.32, 19.02)), row.names = c(NA, 12L), class = "data.frame")
My second frame y is as follows:
y <- structure(list(Date = structure(c(1588057200, 1588057200, 1588057200,
1588057200, 1588060800, 1588060800, 1588060800, 1588060800, 1588060800,
1588060800, 1588064400, 1588064400), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), wx_vars = c("Actual_CloudCover", "Actual_WindSpeed",
"Actual_Precipitation", "Actual_SolarRadiation", "Actual_Temperature",
"Actual_DewPoint", "Actual_CloudCover", "Actual_WindSpeed", "Actual_Precipitation",
"Actual_SolarRadiation", "Actual_Temperature", "Actual_DewPoint"
), wx_actuals = c(54.8, 5.63, 0, 26.1, 57.32, 49.99, 61, 7.24,
0.00015, 23.4, 59.84, 52.11)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -12L))
As you can see, the x forecast date frame has two separate weather forecasts that I want to evaluate the accuracy of: Wx1 and Wx2. Each row has a different weather variable listed after Wx1 or Wx2 which line up exactly with the actual data in data frame y. The Dates are in the same format in both x and y and would work in a simple inner_join, but given the string differences in the wx_vars column of both data frame x and y, I've been attempting to work with fuzzyjoin. No luck yet.
Here is what I've tried, and my biggest issue is what I included as my question title. I have yet to find an example of a fuzzyjoin on two columns where one column match would work in a regular join while the other column match requires fuzzyjoin, in my case partial string matching.
wx_analysis_1<- fuzzy_inner_join(x, y, by = c("Date", "wx_vars"="wx_vars"), match_fun = str_detect)
wx_analysis_2 <- regex_inner_join(x, y, by = c("Date", "wx_vars"="wx_vars"))
No luck with either of these solutions. Is there some syntax I'm missing to handle the date join better? I feel like fuzzy_inner_join by wx_vars should work, but the Date column might be my issue.
Thanks