So I have been using a excel for manual data entry for a CSV file which I am reading into R Studio using read_csv
.
It is reading in fine, my date is in this format 31/07/2013
so I used this to convert it
patients$Date <- as.Date(patients$Date, "%d/%m/%Y")
This all seems straightforward. However, for the second data set running it doesn't convert all values and converts some to NA.
Part of this stems from Excel's in built number format. Which a run date in three ways
- General : Days counted 1900/01/01
- Short : 31/07/2013
- Long : July 31 2013
Now thing is, this time I left it as it was. So I expecting that I should haven an issues.
My Data set is a few thousands rows long and as.Date()
appears to have non-systematically let some values out.
Manually, the changes in excel are not manifesting themselves in R.
How can I find these values and manually change them within R.
I have also put two columns of Year and Month next to it for reference (as I thought I may have this problem this time).
Year Month Date Site Category Services Count
2013 July 31/07/2013 K Test Laboratory 1642
2013 July 31/07/2013 K Test X-Ray 16
I do not require in the day in my current format of 31/07/2013
so could work with 07/2013
So I use this to filter my data
patients %>%
group_by(Date)%>%
filter(Category == "Registration", Site == "K")
So I get the following output
Year Month Date Site Category Services Count
<int> <chr> <chr> <chr> <chr> <chr> <int>
1 2013 July 31/07/2013 K Registration Old Registration Cld 25
2 2013 July 31/07/2013 K Registration Old Registration Male 155
3 2013 July 31/07/2013 K Registration Old Registration Female 445
4 2013 July 31/07/2013 K Registration New Registration Child 24
5 2013 July 31/07/2013 K Registration New Registration Male 106
And once I Apply patients$Date <- as.Date(patients$Date, "%d/%m/%Y")
some of they show as below...BUT not all of them as the column type in the tibble has been changed to Date
So it has worked to some extent.
2017 June NA K Registration Old Registration Female 351
2 2017 June NA K Registration New Registration Child 20
3 2017 June NA K Registration New Registration Male 66
4 2017 June NA K Registration New Registration Female 117
5 2017 June NA K Registration Staff Registration 37
6 2017 June NA K Registration Free Registration 7
>
Is there a way of sorting this?
And what software should I use in future for managing CSV's Open Office's calc seems to be no better. Do I need just a CSV manager to avoid Excel causing problems?
Help would be much appreciated
Cheers