-1

So I have a dataframe called Swine_flu_cases that looks as follows (just an extract):

    Country    Date        Confirmed     
1   Canada  2020-01-22         1                            
2   Egypt   2020-01-23         1                                
3   Algeria 2020-01-24         1                                
4   France  2020-01-25         1                                
5   Zambia  2020-01-26         1                            
6   Congo   2020-01-27         1      

             

This data set looks at the recorded amount of swine flu cases of a country on a specific date.

I have filtered my data to only show variables where the confirmed cases are 1 and have also grouped it by the different country and sorted it by ascending order of date. (I did this to get the dates that these countries each had their first cases)

I have sorted it in ascending order of date because I want to extract the first time each country had their first recorded swine flu case and store that as a vector.

I have tried doing so by using the following code :

first_case_date = as.Date(data.frame(Swine_flu_cases$Date))

This however gave me an error though.

Error in as.Date.default(data.frame(Swine_flu_cases$Date)) : do not know how to convert 'data.frame(Swine_flu_cases$Date)' to class “Date”

What I want to do is create a new variable Swine_flu_cases$days_since_first_case which will take the stored date of each of the countries on my lists first case and subtract that from all the other dates for each country.

My knowledge of for loops is very basic but I know I need to somehow use a for loop for this. I have recently familiarised myself with the lead and lag function as well and was thinking maybe there is a way in which I could combine these two functions to create this variable?

If someone can just give me a general idea on how I could go about doing this please I would really appreciate it.

Mr. Discuss
  • 355
  • 1
  • 4
  • 13
DRC_mami
  • 3
  • 3
  • From the help: The as.Date methods accept character strings, factors, logical NA and objects of classes "POSIXlt" and "POSIXct". You are trying to pass a data frame to it and it complains about it. Also better to define the date format. Try: `first_case_date <- as.Date(Swine_flu_cases$Date, format = "%Y-%m-%d")` – Paul van Oppen Sep 15 '20 at 05:29

1 Answers1

1

You can do this with dplyr and lubridate to make your dates behave.

library(dplyr)
library(lubridate)
Swine_flu_cases %>% 
  mutate(Date = ymd(Date) %>%  # makes the Dates behave better for subtraction
  group_by(Country) %>%        # You want grouped by country
  mutate(days_since_first_case = Date - min(Date)) 
    # subtracts the first date in each group from the current date for the row)
Ben Norris
  • 5,639
  • 2
  • 6
  • 15
  • I tried that, but the dates with the variable are not corresponding with the actual minimum dates but I see how this code works and I'm just trying to find a way to get the minimum date for each country when the number of confirmed cases is one instead of zero. Thank you so much for this insight. – DRC_mami Sep 07 '20 at 17:24