0

I have a (large) dataset with discrete data. These discrete data represent the output of Energy compared to Time. (kW - Time)

These energy usage patterns represent different machines in a household. (Oven, microwave, heating, cooking plates,...)

Let's look at a sample:

W - Time graph

This graph shows the usage pattern of Gas from 04:00-22:00. You can clearly see that at around 15:00-20:00 the patterns are very similar to eachother.

The dataset gives the following values:

2015-11-14 15:18:00+00:00     0.609137
2015-11-14 15:19:00+00:00     0.609137
2015-11-14 15:20:00+00:00     0.609137
2015-11-14 15:21:00+00:00     0.609137
2015-11-14 15:22:00+00:00     0.609137
2015-11-14 15:23:00+00:00     0.609137
2015-11-14 15:24:00+00:00     0.609137
2015-11-14 15:25:00+00:00     0.609137
2015-11-14 15:26:00+00:00     1.270988
2015-11-14 15:27:00+00:00     7.344390
2015-11-14 15:28:00+00:00     3.302752
2015-11-14 15:29:00+00:00     3.456667
2015-11-14 15:30:00+00:00     3.441979
2015-11-14 15:31:00+00:00     2.857143
2015-11-14 15:32:00+00:00     2.857143
2015-11-14 15:33:00+00:00     7.536670
2015-11-14 15:34:00+00:00     2.627737
2015-11-14 15:35:00+00:00     2.712480
2015-11-14 15:36:00+00:00     2.926829
2015-11-14 15:37:00+00:00     2.943902
2015-11-14 15:38:00+00:00     3.000000
2015-11-14 15:39:00+00:00     5.660000
2015-11-14 15:40:00+00:00     5.030244
2015-11-14 15:41:00+00:00     2.926829
2015-11-14 15:42:00+00:00     2.926829
2015-11-14 15:43:00+00:00     2.926829
2015-11-14 15:44:00+00:00     2.997336
2015-11-14 15:45:00+00:00     3.025210
2015-11-14 15:46:00+00:00     7.729800
2015-11-14 15:47:00+00:00     3.076923
2015-11-14 15:48:00+00:00     3.086207
2015-11-14 15:49:00+00:00     3.103448
2015-11-14 15:50:00+00:00     7.579576
2015-11-14 15:51:00+00:00     3.363513
2015-11-14 15:52:00+00:00     3.185841
2015-11-14 15:53:00+00:00     3.185841
2015-11-14 15:54:00+00:00     3.211172
2015-11-14 15:55:00+00:00     3.302752
2015-11-14 15:56:00+00:00     7.520113
2015-11-14 15:57:00+00:00     3.713875
2015-11-14 15:58:00+00:00     3.353168
2015-11-14 15:59:00+00:00     3.302752
2015-11-14 16:00:00+00:00     3.348886
2015-11-14 16:01:00+00:00     3.428571
2015-11-14 16:02:00+00:00     7.942857
2015-11-14 16:03:00+00:00     3.428571
2015-11-14 16:04:00+00:00     3.400801
2015-11-14 16:05:00+00:00     3.364486
2015-11-14 16:06:00+00:00     3.324359
2015-11-14 16:07:00+00:00     3.302752
2015-11-14 16:08:00+00:00     7.889744
2015-11-14 16:09:00+00:00     3.214286
2015-11-14 16:10:00+00:00     3.183271
2015-11-14 16:11:00+00:00     3.157895
2015-11-14 16:12:00+00:00     3.176060
2015-11-14 16:13:00+00:00     3.185841
2015-11-14 16:14:00+00:00     7.854474
2015-11-14 16:15:00+00:00     3.333333
2015-11-14 16:16:00+00:00     3.437908
2015-11-14 16:17:00+00:00     3.529412
2015-11-14 16:18:00+00:00     3.618538
2015-11-14 16:19:00+00:00     5.508159
2015-11-14 16:20:00+00:00     6.274038
2015-11-14 16:21:00+00:00     3.755921
2015-11-14 16:22:00+00:00     3.789474
2015-11-14 16:23:00+00:00     8.093718
2015-11-14 16:24:00+00:00     3.870968
2015-11-14 16:25:00+00:00     3.824788
2015-11-14 16:26:00+00:00     3.789474
2015-11-14 16:27:00+00:00    11.414509
2015-11-14 16:28:00+00:00     8.344301
2015-11-14 16:29:00+00:00     7.751156
2015-11-14 16:30:00+00:00     7.553191
2015-11-14 16:31:00+00:00     7.367347
2015-11-14 16:32:00+00:00     7.346939
2015-11-14 16:33:00+00:00     7.346939
2015-11-14 16:34:00+00:00     7.346939
2015-11-14 16:35:00+00:00     7.346939
2015-11-14 16:36:00+00:00     7.324898
2015-11-14 16:37:00+00:00     7.246531
2015-11-14 16:38:00+00:00     7.346939
2015-11-14 16:39:00+00:00     7.246531
2015-11-14 16:40:00+00:00     7.200000
2015-11-14 16:41:00+00:00     7.200000
2015-11-14 16:42:00+00:00     7.200000
2015-11-14 16:43:00+00:00     8.249231
2015-11-14 16:44:00+00:00     8.630769
2015-11-14 16:45:00+00:00     4.770385
2015-11-14 16:46:00+00:00     0.730223
2015-11-14 16:47:00+00:00     0.730223
2015-11-14 16:48:00+00:00     0.730223
2015-11-14 16:49:00+00:00     0.730223
2015-11-14 16:50:00+00:00     0.730223
2015-11-14 16:51:00+00:00     0.730223
2015-11-14 16:52:00+00:00     0.730223
2015-11-14 16:53:00+00:00     0.773099
2015-11-14 16:54:00+00:00     3.302752
2015-11-14 16:55:00+00:00     5.411433
2015-11-14 16:56:00+00:00     5.990769
2015-11-14 16:57:00+00:00     3.573333
2015-11-14 16:58:00+00:00     3.333333
2015-11-14 16:59:00+00:00     3.068027
2015-11-14 17:00:00+00:00     2.448980
2015-11-14 17:01:00+00:00     2.448980
2015-11-14 17:02:00+00:00     7.548449
2015-11-14 17:03:00+00:00     2.834646
2015-11-14 17:04:00+00:00     2.923382
2015-11-14 17:05:00+00:00     3.130435
2015-11-14 17:06:00+00:00     3.070931
2015-11-14 17:07:00+00:00     2.975207
2015-11-14 17:08:00+00:00     6.961221
2015-11-14 17:09:00+00:00     3.611077
2015-11-14 17:10:00+00:00     2.880000
2015-11-14 17:11:00+00:00     2.940197
2015-11-14 17:12:00+00:00     2.950820
2015-11-14 17:13:00+00:00     3.075466
2015-11-14 17:14:00+00:00     3.103448
2015-11-14 17:15:00+00:00     3.151543
2015-11-14 17:16:00+00:00     3.157895
2015-11-14 17:17:00+00:00     7.774371
2015-11-14 17:18:00+00:00     3.130435
2015-11-14 17:19:00+00:00     3.113343
2015-11-14 17:20:00+00:00     3.103448
2015-11-14 17:21:00+00:00     3.103448
2015-11-14 17:22:00+00:00     3.103448
2015-11-14 17:23:00+00:00     7.758621
2015-11-14 17:24:00+00:00     3.103448
2015-11-14 17:25:00+00:00     3.114243
2015-11-14 17:26:00+00:00     3.130435
2015-11-14 17:27:00+00:00     3.104571
2015-11-14 17:28:00+00:00     3.076923
2015-11-14 17:29:00+00:00     7.743590
2015-11-14 17:30:00+00:00     3.076923
2015-11-14 17:31:00+00:00     3.080902
2015-11-14 17:32:00+00:00     3.103448
2015-11-14 17:33:00+00:00     3.097701
2015-11-14 17:34:00+00:00     3.076923
2015-11-14 17:35:00+00:00     6.096410
2015-11-14 17:36:00+00:00     4.797931
2015-11-14 17:37:00+00:00     3.103448
2015-11-14 17:38:00+00:00     3.178975
2015-11-14 17:39:00+00:00     3.185841
2015-11-14 17:40:00+00:00     3.185841
2015-11-14 17:41:00+00:00     4.784888
2015-11-14 17:42:00+00:00     6.226648
2015-11-14 17:43:00+00:00     3.214286
2015-11-14 17:44:00+00:00     3.289482
2015-11-14 17:45:00+00:00     3.309165
2015-11-14 17:46:00+00:00     3.495146
2015-11-14 17:47:00+00:00     6.772965
2015-11-14 17:48:00+00:00     4.827506
2015-11-14 17:49:00+00:00     3.645022
2015-11-14 17:50:00+00:00     3.673469
2015-11-14 17:51:00+00:00     8.037809
2015-11-14 17:52:00+00:00     3.789474
2015-11-14 17:53:00+00:00     3.789474
2015-11-14 17:54:00+00:00     3.789474
2015-11-14 17:55:00+00:00    10.986235
2015-11-14 17:56:00+00:00     7.869376
2015-11-14 17:57:00+00:00     6.763103
2015-11-14 17:58:00+00:00     6.626263
2015-11-14 17:59:00+00:00     6.496753
2015-11-14 18:00:00+00:00     6.543651
2015-11-14 18:01:00+00:00     6.895425
2015-11-14 18:02:00+00:00     6.959276
2015-11-14 18:03:00+00:00     7.038462
2015-11-14 18:04:00+00:00     6.923077
2015-11-14 18:05:00+00:00     6.961538
2015-11-14 18:06:00+00:00     7.000000
2015-11-14 18:07:00+00:00     7.000000
2015-11-14 18:08:00+00:00     6.961538
2015-11-14 18:09:00+00:00     6.923077
2015-11-14 18:10:00+00:00     7.038462
2015-11-14 18:11:00+00:00     6.959276
2015-11-14 18:12:00+00:00     7.058824
2015-11-14 18:13:00+00:00     6.981900
2015-11-14 18:14:00+00:00     7.018100
2015-11-14 18:15:00+00:00     1.565446
2015-11-14 18:16:00+00:00     0.596026
2015-11-14 18:17:00+00:00     0.596026
2015-11-14 18:18:00+00:00     0.596026
2015-11-14 18:19:00+00:00     0.596026
2015-11-14 18:20:00+00:00     0.596026
2015-11-14 18:21:00+00:00     0.596026
2015-11-14 18:22:00+00:00     0.596026
2015-11-14 18:23:00+00:00     0.596026
2015-11-14 18:24:00+00:00     0.596026

Starting at around 0.6kW at peaking at 8kW

Now, in the dataset, there is are a lot of different of these recurring patterns. As you can see, the pattern at 06:00 to 07:45 and 14:15 to 15:45 are very similar too. Now I am trying to find a way to link these patterns to eachother.

For example creating a list with "similar" patterns each giving them an ID or attribute, and filter them out, so that I am left with a list listing the patterns who are closely related to eachother.

Mind that the dataset could be more than a year full of results.

What I thought of doing is comparing the time duration of the baseline right before a pattern, and right after it. Comparing the highest peaks with eachother, using a threshold value. Any other ideas?

aze45sq6d
  • 876
  • 3
  • 11
  • 26

1 Answers1

1

What you're probably trying to do is (very similar to) Non-Intrusive Load Monitoring. George Hart wrote a paper on this in the early nineties/late eighties (this is a short summary from him, with reference to the paper: http://www.georgehart.com/research/nalm.html). That paper would probably be a good start.

Hart's method (this is what most recent research in the 90's build on, or was compared to in this area) is to:

  1. smooth the time series, so only meaningful steps in load are present. Not Gaussian smoothing, but remove all data points that have values that are not stable for let's say 3 observations in the raw data.
  2. Then create a histogram of the load changes you still see. Hart included a second variable, including inductive load next resistive load you seem to have. This (2d) histogram should show you clear clusters. Those clusters should be easy to define with any basic ML approach.

I found Oliver Parsons blog a really accessible way of learning what's recent in this topic. Good luck!

(I did a project on NILM in the past. That where the academic references come from.)

Peter Smit
  • 1,594
  • 1
  • 13
  • 27
  • Thank you a lot! Indeed. I am doing my thesis about Non-Intrusive Load Monitoring, where we will have to be able to recognize each pattern and link it to a specific household item – aze45sq6d Dec 13 '16 at 11:58