0

I'm using a dataset with all decimal values and timestamp which has the following features :

 1. sno
 2. timestamp
 3. v1
 4. v2
 5. v3

I've the data for 5 months with timestamps for every minute. I need to predict if v1, v2 ,v3 is being used at any time in the future. The values of v1,v2,v3 are between 0 to 25.

How can I do this ?

I've used binary classification before but I've no clue how to process with the multi-label problem to predict. I've used the code below all the time . How should I train the model and how should I use v1,v2,v3 to fit into 'y'?

X_train, X_test, y_train, y_test = train_test_split(train, y, test_size=0.2)




Data:

sno power   voltage v1  v2  v3  timestamp
1   3.74    235.24  0   16  18  2006-12-16 18:03:00
2   4.928   237.14  0   37  16  2006-12-16 18:04:00
3   6.052   236.73  0   37  17  2006-12-16 18:05:00
4   6.752   237.06  0   36  17  2006-12-16 18:06:00
5   6.474   237.13  0   37  16  2006-12-16 18:07:00
6   6.308   235.84  0   36  17  2006-12-16 18:08:00
7   4.464   232.69  0   37  16  2006-12-16 18:09:00
8   3.396   230.98  0   22  18  2006-12-16 18:10:00
9   3.09    232.21  0   12  17  2006-12-16 18:11:00
10  3.73    234.19  0   27  17  2006-12-16 18:12:00
11  2.308   234.96  0   1   17  2006-12-16 18:13:00
12  2.388   236.66  0   1   17  2006-12-16 18:14:00
13  4.598   235.84  0   20  17  2006-12-16 18:15:00
14  4.524   235.6   0   9   17  2006-12-16 18:16:00
15  4.202   235.49  0   1   17  2006-12-16 18:17:00
jason
  • 3,932
  • 11
  • 52
  • 123
  • The phrase "I need to predict if v1, v2, v3 is being used in the future" should suggest to recode the variables in a way that they equal to 0 if not presented and 1 if presented in a given moment. I may be wrong, though, as the description is slighthy vague. You should elaborate more on your data. – E.Z Oct 07 '17 at 06:50
  • @E.Z. How can I add v1,v2,v3 into y ? I've to add timestamp,voltage,power in X right? Need help! – jason Oct 07 '17 at 22:33
  • Basically, yes. You may form `X` variable that way. However, creating `y` requries tinkering as it is not yet explicit how it should be initialized at all. What is the backstory behind `v1 v2 v3`? In other words, what do these variables mean? – E.Z Oct 08 '17 at 05:27
  • @E.Z. They represent different categories and for every sno it can have one or many category. How can I go about that? – jason Oct 08 '17 at 08:29
  • 1
    Then, there is no an established solution. The model I first thought about would be extremely complex. If you understand the concept of these variables, then you should be able to create a basic theory of how they should make up for `y` variable. I cannot tell whether my theory might work or not as I have not got the slightest idea of what `v1 v2 v3` represent, but still... `y` may go as follows: `y = 0` if `v1 v2 v3` are zero; `y = 1` if `v1 = 1` and `v2 v3` are zero, and so on... that would create a mutli-label variable that may be fitted in the future. – E.Z Oct 08 '17 at 08:40
  • However, it will not be easy to verbally summarize the output you will get. – E.Z Oct 08 '17 at 08:40
  • @E.Z. The idea is that for any timestamp it should display v1,v2 or v3. It could be any of these variables or all of these variables. How can I do that? – jason Oct 08 '17 at 08:54
  • Then the approach presented above might work. – E.Z Oct 08 '17 at 10:32
  • @E.Z. Is there a code to make that happen? like dummy_variable is used? – jason Oct 08 '17 at 10:37
  • I believe there is not. It should be written manually. Wait a sec I will provide it in the answer. – E.Z Oct 08 '17 at 10:41
  • The class of your data is `pandas.DataFrame`? – E.Z Oct 08 '17 at 10:50
  • yes. all data is from read_csv() from pandas. – jason Oct 08 '17 at 10:59
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/156200/discussion-between-e-z-and-jason). – E.Z Oct 08 '17 at 11:13

1 Answers1

5

Following the documentation:

The multiclass support is handled according to a one-vs-one scheme (and should thus support one-vs-all strategy).

one-vs-one strat

The one-vs-one scheme basically refers to using a classifier per pair of classes. At a prediction stage, the class that receives the most votes (the outputs of the each classifier) is eventually selected as a prediction. If such a voting has a tie, i.e. having two classes with an equal amount of votes, then the classification confidence plays a role.

To use SVM with such a scheme, one should go:

from sklearn.multiclass import OneVsOneClassifier
from sklearn.svm import SVC

...

subclf = SVC(**params)
clf = OneVsOneClassifier(estimator=subclf)

clf.fit()

one-vs-rest strat

The other way around would be to use a one-vs-all strategy. This strategy fits a classifier per class and against all other classes in the data. It is more popular than the first scheme as it is fairly easier to interpert the results, and the computational time is much weaker. It is as simple to use as the first example:

from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC

...

subclf = SVC(**params)
clf = OneVsRestClassifier(estimator=subclf)

clf.fit()

To read more about multi-label classification and learning proceed here



Aftermath variable coding

So, the basic idea is to instantiate a complex (i.e. multi-label) target variable in a way that:

  • y equals to 0 if v1 v2 v3 are zeros

  • y equals to 1 if either v1 or v2 or v3 is one

  • y equals to 2 if either v1 v2 or v1 v3 or v2 v3 are ones

  • y equals to 3 if v1 v2 v3 are ones

The workaround may be the following:

import numpy as np

y = []

for i, j, k in zip(data['v1'], data['v2'], data['v3']):
if i and j and k > 0:
    y.append(3)
elif i and j or i and k or j and k > 0:
    y.append(2)
elif i or j or k > 0:
    y.append(1)
else:
    y.append(0)
E.Z
  • 1,958
  • 1
  • 18
  • 27
  • I've updated the data. Please can you share the code I can use with the data above . Thanks in advance – jason Oct 07 '17 at 22:20
  • how should I split data ? What should I do to get v1,v2,v3 values in y which I use it to predict. PLease help! – jason Oct 07 '17 at 22:22