-3

I have the following data science problem: I have a set of arrays. An array represents a month consumption of lighting, heating or ventilation in which each line represents a consumption for one hour. So for each month in a year, I have 3 arrays.

For example, one array representing the heating consumption in March 2019 looks like this:

enter image description here

The purpose is to predict the type of consumption (lighting, heating or ventilation) of a month consumption. So, if I want to use a decision tree or neural networks for instance. How do I shape the data ? What will be the variables? Usually, a line is a data and columns are the variables but in my case a set of lines represent "one data" and I don't know what can be the variables.

I tried to compute maximum, minimum, std, mean etc...to sum up one array is just one line. But, I would like to know if there is another way to do this kind of prediction with set of arrays.

Thank you.

Juan
  • 184
  • 1
  • 4
  • 16
  • Can you explain what you mean by "type of consumption"? Do the samples all belong to the same customer? What sort of outcome do you expect? – tripleee Mar 29 '19 at 13:53
  • i dont see where the class come from ? – anilkunchalaece Mar 29 '19 at 13:54
  • If we give a month consumption to the model like the example above (without the last colum), this model has to recognize if it is a lighting, heating or ventilation consumption. The samples comes from the same building. The data were taken from meters of the building. Thanks. – Juan Mar 29 '19 at 13:56

1 Answers1

0

There is nothing wrong with the format of your data.

What will be the variable? So you said that you want to classify a given array of one whole month? Then your variable would be the whole month's array. Not the individual rows (or lines as you refer to them). So in your model, one datum would be one month, since that is what you want your model to to learn and predict(or classify).

Also, if you are using a neural network architecture, in the training phase you obviously need labels. You should not give a label for each row or hour, but rather provide a single label for each month.

You can take the mean or median or which ever statistic of the month to construct features, but that is kindof the neural network's job.

I dont know the size of your dataset, but if you dont have many months of each class you are going to run into a few problems.

I hope this puts you in the right direction and clears things up.

EDIT: typo

BloodRabz
  • 63
  • 7