1

I have weather dataset my data is date-dependent

I want to predict the temperature from 07 May 2008 until 18 May 2008 (which is maybe a total of 10-15 observations) my data size is around 200

I will be using decision tree/RF and SVM & NN to make my prediction

I've never handled data like this so I'm not sure how to sample non random data I want to sample data 80% train data and 30% test data but I want to sample the data in the original order not randomly. Is that possible ?


install.packages("rattle")
install.packages("RGtk2")
library("rattle")

seed <- 42
set.seed(seed)
fname <- system.file("csv", "weather.csv", package = "rattle")
dataset <- read.csv(fname, encoding = "UTF-8")
dataset <- dataset[1:200,]
dataset <- dataset[order(dataset$Date),]
 
set.seed(321)
sample_data = sample(nrow(dataset), nrow(dataset)*.8)
test<-dataset[sample_data,] # 30%
train<-dataset[-sample_data,] # 80%

output


> head(dataset)
        Date Location MinTemp MaxTemp Rainfall Evaporation Sunshine WindGustDir WindGustSpeed
1 2007-11-01 Canberra     8.0    24.3      0.0         3.4      6.3          NW            30
2 2007-11-02 Canberra    14.0    26.9      3.6         4.4      9.7         ENE            39
3 2007-11-03 Canberra    13.7    23.4      3.6         5.8      3.3          NW            85
4 2007-11-04 Canberra    13.3    15.5     39.8         7.2      9.1          NW            54
5 2007-11-05 Canberra     7.6    16.1      2.8         5.6     10.6         SSE            50
6 2007-11-06 Canberra     6.2    16.9      0.0         5.8      8.2          SE            44
  WindDir9am WindDir3pm WindSpeed9am WindSpeed3pm Humidity9am Humidity3pm Pressure9am
1         SW         NW            6           20          68          29      1019.7
2          E          W            4           17          80          36      1012.4
3          N        NNE            6            6          82          69      1009.5
4        WNW          W           30           24          62          56      1005.5
5        SSE        ESE           20           28          68          49      1018.3
6         SE          E           20           24          70          57      1023.8
  Pressure3pm Cloud9am Cloud3pm Temp9am Temp3pm RainToday RISK_MM RainTomorrow
1      1015.0        7        7    14.4    23.6        No     3.6          Yes
2      1008.4        5        3    17.5    25.7       Yes     3.6          Yes
3      1007.2        8        7    15.4    20.2       Yes    39.8          Yes
4      1007.0        2        7    13.5    14.1       Yes     2.8          Yes
5      1018.5        7        7    11.1    15.4       Yes     0.0           No
6      1021.7        7        5    10.9    14.8        No     0.2           No



> head(test)
          Date Location MinTemp MaxTemp Rainfall Evaporation Sunshine WindGustDir WindGustSpeed
182 2008-04-30 Canberra    -1.8    14.8      0.0         1.4      7.0           N            28
77  2008-01-16 Canberra    17.9    33.2      0.0        10.4      8.4           N            59
88  2008-01-27 Canberra    13.2    31.3      0.0         6.6     11.6         WSW            46
58  2007-12-28 Canberra    15.1    28.3     14.4         8.8     13.2         NNW            28
96  2008-02-04 Canberra    18.2    22.6      1.8         8.0      0.0         ENE            33
126 2008-03-05 Canberra    12.0    27.6      0.0         6.0     11.0           E            46
    WindDir9am WindDir3pm WindSpeed9am WindSpeed3pm Humidity9am Humidity3pm Pressure9am
182          E          N            2           19          80          40      1024.2
77           N        NNE           15           20          58          62      1008.5
88           N        WNW            4           26          71          28      1013.1
58         NNW         NW            6           13          73          44      1016.8
96         SSE        ENE            7           13          92          76      1014.4
126        SSE        WSW            7            6          69          35      1025.5
    Pressure3pm Cloud9am Cloud3pm Temp9am Temp3pm RainToday RISK_MM RainTomorrow
182      1020.5        1        7     5.3    13.9        No     0.0           No
77       1006.1        6        7    24.5    23.5        No     4.8          Yes
88       1009.5        1        4    19.7    30.7        No     0.0           No
58       1013.4        1        5    18.3    27.4       Yes     0.0           No
96       1011.5        8        8    18.5    22.1       Yes     9.0          Yes
126      1022.2        1        1    15.7    26.2        No     0.0           No


> head(train)
         Date Location MinTemp MaxTemp Rainfall Evaporation Sunshine WindGustDir WindGustSpeed
7  2007-11-07 Canberra     6.1    18.2      0.2         4.2      8.4          SE            43
9  2007-11-09 Canberra     8.8    19.5      0.0         4.0      4.1           S            48
11 2007-11-11 Canberra     9.1    25.2      0.0         4.2     11.9           N            30
16 2007-11-16 Canberra    12.4    32.1      0.0         8.4     11.1           E            46
22 2007-11-22 Canberra    16.4    19.4      0.4         9.2      0.0           E            26
25 2007-11-25 Canberra    15.4    28.4      0.0         4.4      8.1         ENE            33
   WindDir9am WindDir3pm WindSpeed9am WindSpeed3pm Humidity9am Humidity3pm Pressure9am
7          SE        ESE           19           26          63          47      1024.6
9           E        ENE           19           17          70          48      1026.1
11         SE         NW            6            9          74          34      1024.4
16         SE        WSW            7            9          70          22      1017.9
22        ENE          E            6           11          88          72      1010.7
25        SSE         NE            9           15          85          31      1022.4
   Pressure3pm Cloud9am Cloud3pm Temp9am Temp3pm RainToday RISK_MM RainTomorrow
7       1022.2        4        6    12.4    17.3        No     0.0           No
9       1022.7        7        7    14.1    18.9        No    16.2          Yes
11      1021.1        1        2    14.6    24.0        No     0.2           No
16      1012.8        0        3    19.1    30.7        No     0.0           No
22      1008.9        8        8    16.5    18.3        No    25.8          Yes
25      1018.6        8        2    16.8    27.3        No     0.0           No
nullUser
  • 11
  • 3

1 Answers1

0

I use mtcars as an example. An option to non-randomly split your data in train and test is to first create a sample size based on the number of rows in your data. After that you can use split to split the data exact at the 80% of your data. You using the following code:

smp_size <- floor(0.80 * nrow(mtcars))
split <- split(mtcars, rep(1:2, each = smp_size))

With the following code you can turn the split in train and test:

train <- split$`1`
test <- split$`2`

Let's check the number of rows:

> nrow(train)
[1] 25
> nrow(test)
[1] 7

Now the data is split in train and test without losing their order.

Quinten
  • 35,235
  • 5
  • 20
  • 53
  • Great thank you is there a way I can rename split columns (1,2) into a different names ? I tried adding f = names("A","B") but it didn't work – nullUser Jun 11 '22 at 19:14
  • @nullUser, You can use this: `names(split) <- c("A", "B") ` – Quinten Jun 11 '22 at 19:16