Let's assume you have the following data structure and we want to predict the temperature given 1 day in the past:
import tensorflow as tf
import pandas as pd
import numpy as np
df = pd.DataFrame(data={
'temperature': np.random.random((1, 20)).ravel(),
'pressure': np.random.random((1, 20)).ravel(),
'humidity': np.random.random((1, 20)).ravel(),
'wind': np.random.random((1, 20)).ravel()
})
print(df.to_markdown())
|
temperature |
pressure |
humidity |
wind |
0 |
0.0589101 |
0.278302 |
0.875369 |
0.622687 |
1 |
0.594924 |
0.797274 |
0.510012 |
0.374484 |
2 |
0.511291 |
0.334929 |
0.401483 |
0.77062 |
3 |
0.711329 |
0.72051 |
0.595685 |
0.872691 |
4 |
0.495425 |
0.520179 |
0.516858 |
0.628928 |
5 |
0.676054 |
0.67902 |
0.0213801 |
0.0267594 |
6 |
0.058189 |
0.69932 |
0.885174 |
0.00602091 |
7 |
0.708245 |
0.871698 |
0.345451 |
0.448352 |
8 |
0.958427 |
0.471423 |
0.412678 |
0.618024 |
9 |
0.941202 |
0.825181 |
0.211916 |
0.0808273 |
10 |
0.49252 |
0.541955 |
0.00522009 |
0.396557 |
11 |
0.323757 |
0.113585 |
0.797503 |
0.323961 |
12 |
0.819055 |
0.637116 |
0.285361 |
0.569794 |
13 |
0.95123 |
0.00604303 |
0.208746 |
0.150214 |
14 |
0.89466 |
0.948916 |
0.556422 |
0.555165 |
15 |
0.705789 |
0.269704 |
0.289568 |
0.391438 |
16 |
0.154502 |
0.703137 |
0.184157 |
0.765623 |
17 |
0.25974 |
0.934706 |
0.172775 |
0.412022 |
18 |
0.403475 |
0.144796 |
0.0224043 |
0.891236 |
19 |
0.922302 |
0.805214 |
0.0232178 |
0.951568 |
The first thing we have to do is separate the data into features and labels:
features = df.iloc[::2, :] # Get every first row
labels = df.iloc[1::2, :] # Get every second row since we want to predict the temperature given 1 day in the past
Features:
|
temperature |
pressure |
humidity |
wind |
0 |
0.0589101 |
0.278302 |
0.875369 |
0.622687 |
2 |
0.511291 |
0.334929 |
0.401483 |
0.77062 |
4 |
0.495425 |
0.520179 |
0.516858 |
0.628928 |
6 |
0.058189 |
0.69932 |
0.885174 |
0.00602091 |
8 |
0.958427 |
0.471423 |
0.412678 |
0.618024 |
10 |
0.49252 |
0.541955 |
0.00522009 |
0.396557 |
12 |
0.819055 |
0.637116 |
0.285361 |
0.569794 |
14 |
0.89466 |
0.948916 |
0.556422 |
0.555165 |
16 |
0.154502 |
0.703137 |
0.184157 |
0.765623 |
18 |
0.403475 |
0.144796 |
0.0224043 |
0.891236 |
Labels:
|
temperature |
pressure |
humidity |
wind |
1 |
0.594924 |
0.797274 |
0.510012 |
0.374484 |
3 |
0.711329 |
0.72051 |
0.595685 |
0.872691 |
5 |
0.676054 |
0.67902 |
0.0213801 |
0.0267594 |
7 |
0.708245 |
0.871698 |
0.345451 |
0.448352 |
9 |
0.941202 |
0.825181 |
0.211916 |
0.0808273 |
11 |
0.323757 |
0.113585 |
0.797503 |
0.323961 |
13 |
0.95123 |
0.00604303 |
0.208746 |
0.150214 |
15 |
0.705789 |
0.269704 |
0.289568 |
0.391438 |
17 |
0.25974 |
0.934706 |
0.172775 |
0.412022 |
19 |
0.922302 |
0.805214 |
0.0232178 |
0.951568 |
Since you are only interested in predicting the temperature, we can remove the other features from the labels and convert both to arrays:
features = features.to_numpy() # shape (10, 4)
labels = labels['temperature'].to_numpy() # shape (10,)
features = np.expand_dims(features, axis=1) # shape (10, 1, 4)
Note that a time dimension is added to features
, which essentially means that each sample in the dataset represents one timestep (one day) and for each timestep there are 4 features (temperature, pressure, humidity, wind).
Building and running a RNN model:
inputs = tf.keras.layers.Input(shape=(features.shape[1], features.shape[2]))
rnn_out = tf.keras.layers.SimpleRNN(32)(inputs)
outputs = tf.keras.layers.Dense(1)(rnn_out) # one output = temperature
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='adam', loss="mse")
model.summary()
history = model.fit(features, labels, batch_size=2, epochs=3)
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 1, 4)] 0
simple_rnn (SimpleRNN) (None, 32) 1184
dense_1 (Dense) (None, 1) 33
=================================================================
Total params: 1,217
Trainable params: 1,217
Non-trainable params: 0
_________________________________________________________________
Epoch 1/3
5/5 [==============================] - 1s 9ms/step - loss: 0.7859
Epoch 2/3
5/5 [==============================] - 0s 7ms/step - loss: 0.5862
Epoch 3/3
5/5 [==============================] - 0s 6ms/step - loss: 0.4354
Make predictions like this:
samples = 1
model.predict(tf.random.normal((samples, 1, 4)))
# array([[-1.610171]], dtype=float32)
You can also consider normalizing your data before training like this:
# You usually also normalize your data before training
mean = df.mean(axis=0)
std = df.std(axis=0)
df = df - mean / std
And that's about it.