0

I would like to create a tibble with a mix of repeated and unrepeated measures of variables along with the dates when they were measured;

  • 3 variables (var1, var2, var3) that were measured 16 times during the months of April, May, June at irregular intervals
  • 4 variables (var4, var5, var6, var7) that were measured once in July
  • 2 variables (var8, var9) that were measured also once in July

To create the tibble, I could write vectors with every combination of variable-date and measurement, but I am wondering if there is a way to do this more efficiently, since 3 variables are repeatedly taken for 16 times. I've written this chunk of code with variables, dates and measurements to start with, but I'm stuck there. Any suggestions?

library (tidyverse)
variables <- c(var1, var2, var3, var4, var5, var6, var7, var8, var9)
mydates <- c(2013-04-15,
             2013-04-16,
             2013-04-17,
             2013-04-22,
             2013-04-25,
             2013-04-29,
             2013-05-02,
             2013-05-06,
             2013-05-09,
             2013-05-13,
             2013-05-16,
             2013-05-20,
             2013-05-23,
             2013-05-27,
             2013-05-30,
             2013-06-03,
             2013-07-04,  
             2013-07-08)
measurements <- c(3.2, 4.6, 1.1, 3.0, 3.6, 1.6, 1.4, 1.4, 4.8, 3.5, 4.0, 
2.7, 1.4, 2.9, 2.4, 3.6, 3.7, 4.3, 3.6, 3.5, 4.7, 1.8, 3.5, 2.4, 2.1, 1.2,
2.3, 3.9, 1.6, 2.8, 5.0, 2.4, 2.2, 2.9, 1.8, 1.7, 4.4, 3.9, 4.4, 2.6, 1.7, 
4.2, 3.4, 4.4, 4.7, 5.0, 3.0, 3.7, 2.1, 2.9, 4.5, 1.5, 2.2, 2.9)

tibble (variables, mydates, measurements)

I would like a tibble that looks like this, with my first 3 variables each repeated 16 times, my 16 first dates each repeated 3 times and the measurements:

variables   mydates     measurements
var1        2013-04-15  3.2
var2        2013-04-15  4.6
var3        2013-04-15  1.1
var1        2013-04-16  3.0
var2        2013-04-16  3.6
var3        2013-04-16  1.6
var1        2013-04-17  1.4
var2        2013-04-17  1.4
var3        2013-04-17  4.8
...         ...         ...  # measurements for var1, var2, var3 were repeatedly taken during the 16 first dates in the vector mydates.
var4        2013-07-04  2.1
var5        2013-07-04  2.9
var6        2013-07-04  4.5
var7        2013-07-04  1.5
var8        2013-07-08  2.2
var9        2013-07-08  2.9
patL
  • 2,259
  • 1
  • 17
  • 38
Mleinss
  • 37
  • 5
  • But the length of `measurements` is different from `mydates`. Can you expand your desired output? – patL Dec 19 '18 at 20:41
  • What makes you think "there must be a way with tidyverse"? What are you creating this from? A text file? A spreadsheet? From random numbers? – Spacedman Dec 19 '18 at 20:43
  • @patL, I've expanded the desired output tibble and added a comment, so hopefully it's more clear. The length of `measurements` is different from `mydates` because some measurements were taken on the same date. @Spacedman, I got this data from excel spreadsheets. The data is scattered across several column headers and spreadsheets, so I would like to create this tibble to tidy it up. – Mleinss Dec 20 '18 at 15:22

1 Answers1

1

Here is an (bit 'dirty') alternative using base::expand.grid and lubridate. I transformed your mydates vector into a date class object.

Once you get all combinations between variables and mydates, you can bind measurements and transform into a tibble using as_data_frame.

library (tidyverse)
library(lubridate) 


variables <- c("var1", "var2", "var3", "var4", "var5", "var6", "var7", "var8", "var9")

mydates <- c("2013-04-15",
             "2013-04-16",
             "2013-04-17",
             '2013-04-22',
             '2013-04-25',
             '2013-04-29',
             "2013-05-02",
             "2013-05-06",
             "2013-05-09",
             "2013-05-13",
             "2013-05-16",
             "2013-05-20",
             "2013-05-23",
             "2013-05-27",
             "2013-05-30",
             "2013-06-03",
             "2013-07-04",  
             "2013-07-08") %>% 
  as_date()

measurements <- c(3.2, 4.6, 1.1, 3.0, 3.6, 1.6, 1.4, 1.4, 4.8, 3.5, 4.0, 
                  2.7, 1.4, 2.9, 2.4, 3.6, 3.7, 4.3, 3.6, 3.5, 4.7, 1.8, 3.5, 2.4, 2.1, 1.2,
                  2.3, 3.9, 1.6, 2.8, 5.0, 2.4, 2.2, 2.9, 1.8, 1.7, 4.4, 3.9, 4.4, 2.6, 1.7, 
                  4.2, 3.4, 4.4, 4.7, 5.0, 3.0, 3.7, 2.1, 2.9, 4.5, 1.5, 2.2, 2.9)



mydata <- expand.grid(vars = variables[1:3], 
                      dates = mydates[month(mydates) < 7]) %>% 
  rbind(expand.grid(vars = variables[4:7], 
                    dates = mydates[month(mydates) == 7 & day(mydates) == 4])) %>% 
  rbind(expand.grid(vars = variables[8:9], 
                    dates = mydates[month(mydates) == 7 & day(mydates) == 8])) %>% 
  mutate(measures = measurements) %>% 
  as_data_frame()

And the output would be:

mydata

## A tibble: 54 x 3
#   vars  dates      measures
#   <fct> <date>        <dbl>
# 1 var1  2013-04-15      3.2
# 2 var2  2013-04-15      4.6
# 3 var3  2013-04-15      1.1
# 4 var1  2013-04-16      3  
# 5 var2  2013-04-16      3.6
# 6 var3  2013-04-16      1.6
# 7 var1  2013-04-17      1.4
# 8 var2  2013-04-17      1.4
# 9 var3  2013-04-17      4.8
#10 var1  2013-04-22      3.5
## ... with 44 more rows
patL
  • 2,259
  • 1
  • 17
  • 38
  • 1
    That's exactly what I was looking for, thanks a lot! It is very neat the way you can use the `month` and `day` in the dates once you transformed the `mydates` vector into a `date` class object. I'll definitely use that in future code. – Mleinss Dec 20 '18 at 21:27
  • Glad that it works! If it solves your problem, don't forget to vote it up ;) – patL Dec 20 '18 at 21:32
  • Sorry, my vote do not show yet since I have less than 15 reputation. In the `expand.grid` chunk, would it be possible to refer to the variables by their name rather than by their number? – Mleinss Dec 20 '18 at 22:12
  • 1
    No problem. Yes, you can do this. For example (using part of the code) `expand.grid(vars =variables[variables %in% c("var1", "var2", "var3")], dates = mydates[month(mydates) < 7])`. This is more direct though: `expand.grid(vars = c("var1", "var2", "var3"), dates = mydates[month(mydates) < 7])` – patL Dec 20 '18 at 22:22