I have weather dataset from 01 Nov 2007 until 18 May 2008 my data is date-dependent
I want to predict the temperature from 07 May 2008 until 18 May 2008 (which is maybe a total of 10-15 observations) my data size is around 200
I will be using decision tree/RF and SVM & NN to make my prediction
I've never handled data like this so I'm not sure how to sample it if we ignore the bias factor can I sample training data from 01 Nov 2007 to 18 May 2008 and test data from 07 May 2008 to 18 May 2008? or is there a better way to handle this ? or would it be better to first sort my data by date then split my data (ordered) with 80:20 for test and training set then just output the required date?
install.packages("rattle")
install.packages("RGtk2")
library("rattle")
seed <- 42
set.seed(seed)
fname <- system.file("csv", "weather.csv", package = "rattle")
dataset <- read.csv(fname, encoding = "UTF-8")
dataset$Date <- convert_to_date(dataset$Date)
dataset <- dataset[order(as.Date(dataset$Date, format="%Y/%M/%D")),]
dataset <- dataset[1:200,]
str(dataset)
> str(dataset)
'data.frame': 200 obs. of 24 variables:
$ Date : Date, format: "2007-11-01" "2007-11-02" "2007-11-03" ...
$ Location : chr "Canberra" "Canberra" "Canberra" "Canberra" ...
$ MinTemp : num 8 14 13.7 13.3 7.6 6.2 6.1 8.3 8.8 8.4 ...
$ MaxTemp : num 24.3 26.9 23.4 15.5 16.1 16.9 18.2 17 19.5 22.8 ...
$ Rainfall : num 0 3.6 3.6 39.8 2.8 0 0.2 0 0 16.2 ...
$ Evaporation : num 3.4 4.4 5.8 7.2 5.6 5.8 4.2 5.6 4 5.4 ...
$ Sunshine : num 6.3 9.7 3.3 9.1 10.6 8.2 8.4 4.6 4.1 7.7 ...
$ WindGustDir : chr "NW" "ENE" "NW" "NW" ...
$ WindGustSpeed: int 30 39 85 54 50 44 43 41 48 31 ...
$ WindDir9am : chr "SW" "E" "N" "WNW" ...
$ WindDir3pm : chr "NW" "W" "NNE" "W" ...
$ WindSpeed9am : int 6 4 6 30 20 20 19 11 19 7 ...
$ WindSpeed3pm : int 20 17 6 24 28 24 26 24 17 6 ...
$ Humidity9am : int 68 80 82 62 68 70 63 65 70 82 ...
$ Humidity3pm : int 29 36 69 56 49 57 47 57 48 32 ...
$ Pressure9am : num 1020 1012 1010 1006 1018 ...
$ Pressure3pm : num 1015 1008 1007 1007 1018 ...
$ Cloud9am : int 7 5 8 2 7 7 4 6 7 7 ...
$ Cloud3pm : int 7 3 7 7 7 5 6 7 7 1 ...
$ Temp9am : num 14.4 17.5 15.4 13.5 11.1 10.9 12.4 12.1 14.1 13.3 ...
$ Temp3pm : num 23.6 25.7 20.2 14.1 15.4 14.8 17.3 15.5 18.9 21.7 ...
$ RainToday : chr "No" "Yes" "Yes" "Yes" ...
$ RISK_MM : num 3.6 3.6 39.8 2.8 0 0.2 0 0 16.2 0 ...
$ RainTomorrow : chr "Yes" "Yes" "Yes" "Yes" ...