SVM - scoring function

Question

I have the following output from weka for SVM classification. I wanted to plot the SVM classifier output in to anomaly or normal. How is it possible to get the SVM scoring function out of this output?

=== Run information ===

Scheme:       weka.classifiers.functions.SMO -C 1.0 -L 0.001 -P 1.0E-12 -N 0 -V -1 -W 1 -K "weka.classifiers.functions.supportVector.PolyKernel -E 1.0 -C 250007"
Relation:     KDDTrain
Instances:    125973
Attributes:   42
              duration
              protocol_type
              service
              flag
              src_bytes
              dst_bytes
              land
              wrong_fragment
              urgent
              hot
              num_failed_logins
              logged_in
              num_compromised
              root_shell
              su_attempted
              num_root
              num_file_creations
              num_shells
              num_access_files
              num_outbound_cmds
              is_host_login
              is_guest_login
              count
              srv_count
              serror_rate
              srv_serror_rate
              rerror_rate
              srv_rerror_rate
              same_srv_rate
              diff_srv_rate
              srv_diff_host_rate
              dst_host_count
              dst_host_srv_count
              dst_host_same_srv_rate
              dst_host_diff_srv_rate
              dst_host_same_src_port_rate
              dst_host_srv_diff_host_rate
              dst_host_serror_rate
              dst_host_srv_serror_rate
              dst_host_rerror_rate
              dst_host_srv_rerror_rate
              class
Test mode:    10-fold cross-validation

=== Classifier model (full training set) ===

SMO

Kernel used:
  Linear Kernel: K(x,y) = <x,y>

Classifier for classes: normal, anomaly

BinarySMO

Machine linear: showing attribute weights, not support vectors.

        -0.0498 * (normalized) duration
 +       0.5131 * (normalized) protocol_type=tcp
 +      -0.6236 * (normalized) protocol_type=udp
 +       0.1105 * (normalized) protocol_type=icmp
 +      -1.1861 * (normalized) service=auth
 +       0      * (normalized) service=bgp
 +       0      * (normalized) service=courier
 +       1      * (normalized) service=csnet_ns
 +       1      * (normalized) service=ctf
 +       1      * (normalized) service=daytime
 +      -0      * (normalized) service=discard
 +      -1.2505 * (normalized) service=domain
 +      -0.6878 * (normalized) service=domain_u
 +       0.9418 * (normalized) service=echo
 +       1.1964 * (normalized) service=eco_i
 +       0.9767 * (normalized) service=ecr_i
 +       0.0073 * (normalized) service=efs
 +       0.0595 * (normalized) service=exec
 +      -1.4426 * (normalized) service=finger
 +      -1.047  * (normalized) service=ftp
 +      -1.4225 * (normalized) service=ftp_data
 +       2      * (normalized) service=gopher
 +       1      * (normalized) service=hostnames
 +      -0.9961 * (normalized) service=http
 +       0.7255 * (normalized) service=http_443
 +       0.5128 * (normalized) service=imap4
 +      -6.3664 * (normalized) service=IRC
 +       1      * (normalized) service=iso_tsap
 +      -0      * (normalized) service=klogin
 +       0      * (normalized) service=kshell
 +       0.7422 * (normalized) service=ldap
 +       1      * (normalized) service=link
 +       0.5993 * (normalized) service=login
 +       1      * (normalized) service=mtp
 +       1      * (normalized) service=name
 +       0.2322 * (normalized) service=netbios_dgm
 +       0.213  * (normalized) service=netbios_ns
 +       0.1902 * (normalized) service=netbios_ssn
 +       1.1472 * (normalized) service=netstat
 +       0.0504 * (normalized) service=nnsp
 +       1.058  * (normalized) service=nntp
 +      -1      * (normalized) service=ntp_u
 +      -1.5344 * (normalized) service=other
 +       1.3595 * (normalized) service=pm_dump
 +       0.8355 * (normalized) service=pop_2
 +      -2      * (normalized) service=pop_3
 +       0      * (normalized) service=printer
 +       1.051  * (normalized) service=private
 +      -0.3082 * (normalized) service=red_i
 +       1.0034 * (normalized) service=remote_job
 +       1.0112 * (normalized) service=rje
 +      -1.0454 * (normalized) service=shell
 +      -1.6948 * (normalized) service=smtp
 +       0.1388 * (normalized) service=sql_net
 +      -0.3438 * (normalized) service=ssh
 +       1      * (normalized) service=supdup
 +       0.8756 * (normalized) service=systat
 +      -1.6856 * (normalized) service=telnet
 +      -0      * (normalized) service=tim_i
 +      -0.8579 * (normalized) service=time
 +      -0.726  * (normalized) service=urh_i
 +      -1.0285 * (normalized) service=urp_i
 +       1.0347 * (normalized) service=uucp
 +       0      * (normalized) service=uucp_path
 +       0      * (normalized) service=vmnet
 +       1      * (normalized) service=whois
 +      -1.3388 * (normalized) service=X11
 +       0      * (normalized) service=Z39_50
 +       1.7882 * (normalized) flag=OTH
 +      -3.0982 * (normalized) flag=REJ
 +      -1.7279 * (normalized) flag=RSTO
 +       1      * (normalized) flag=RSTOS0
 +       2.4264 * (normalized) flag=RSTR
 +       1.5906 * (normalized) flag=S0
 +      -1.952  * (normalized) flag=S1
 +      -0.9628 * (normalized) flag=S2
 +      -0.3455 * (normalized) flag=S3
 +       1.2757 * (normalized) flag=SF
 +       0.0054 * (normalized) flag=SH
 +       0.8742 * (normalized) src_bytes
 +       0.0542 * (normalized) dst_bytes
 +      -1.2659 * (normalized) land=1
 +       2.7922 * (normalized) wrong_fragment
 +       0.0662 * (normalized) urgent
 +       8.1153 * (normalized) hot
 +       2.4822 * (normalized) num_failed_logins
 +       0.2242 * (normalized) logged_in=1
 +      -0.0544 * (normalized) num_compromised
 +       0.9248 * (normalized) root_shell
 +      -2.363  * (normalized) su_attempted
 +      -0.2024 * (normalized) num_root
 +      -1.2791 * (normalized) num_file_creations
 +      -0.0314 * (normalized) num_shells
 +      -1.4125 * (normalized) num_access_files
 +      -0.0154 * (normalized) is_host_login=1
 +      -2.3307 * (normalized) is_guest_login=1
 +       4.3191 * (normalized) count
 +      -2.7484 * (normalized) srv_count
 +      -0.6276 * (normalized) serror_rate
 +       2.843  * (normalized) srv_serror_rate
 +       0.6105 * (normalized) rerror_rate
 +       3.1388 * (normalized) srv_rerror_rate
 +      -0.1262 * (normalized) same_srv_rate
 +      -0.1825 * (normalized) diff_srv_rate
 +       0.2961 * (normalized) srv_diff_host_rate
 +       0.7812 * (normalized) dst_host_count
 +      -1.0053 * (normalized) dst_host_srv_count
 +       0.0284 * (normalized) dst_host_same_srv_rate
 +       0.4419 * (normalized) dst_host_diff_srv_rate
 +       1.384  * (normalized) dst_host_same_src_port_rate
 +       0.8004 * (normalized) dst_host_srv_diff_host_rate
 +       0.2301 * (normalized) dst_host_serror_rate
 +       0.6401 * (normalized) dst_host_srv_serror_rate
 +       0.6422 * (normalized) dst_host_rerror_rate
 +       0.3692 * (normalized) dst_host_srv_rerror_rate
 -       2.5266

Number of kernel evaluations: -1049600465

Output prediction - sample output

inst#     actual  predicted error prediction
        1   1:normal   1:normal       1
        2   1:normal   1:normal       1
        3  2:anomaly  2:anomaly       1
        4   1:normal   1:normal       1
        5   1:normal   1:normal       1
        6  2:anomaly  2:anomaly       1
        7  2:anomaly  2:anomaly       1
        8  2:anomaly  2:anomaly       1
        9  2:anomaly  2:anomaly       1
       10  2:anomaly  2:anomaly       1
       11  2:anomaly  2:anomaly       1
       12  2:anomaly  2:anomaly       1
       13   1:normal   1:normal       1
       14  2:anomaly   1:normal   +   1
       15  2:anomaly  2:anomaly       1
       16  2:anomaly  2:anomaly       1
       17   1:normal   1:normal       1
       18  2:anomaly  2:anomaly       1
       19   1:normal   1:normal       1
       20   1:normal   1:normal       1
       21  2:anomaly  2:anomaly       1
       22  2:anomaly  2:anomaly       1
       23   1:normal   1:normal       1
       24   1:normal   1:normal       1
       25  2:anomaly  2:anomaly       1
       26   1:normal   1:normal       1
       27  2:anomaly  2:anomaly       1
       28   1:normal   1:normal       1
       29   1:normal   1:normal       1
       30   1:normal   1:normal       1
       31  2:anomaly  2:anomaly       1
       32  2:anomaly  2:anomaly       1
       33   1:normal   1:normal       1
       34  2:anomaly  2:anomaly       1
       35   1:normal   1:normal       1
       36   1:normal   1:normal       1
       37   1:normal   1:normal       1
       38  2:anomaly  2:anomaly       1
       39   1:normal   1:normal       1
       40  2:anomaly  2:anomaly       1
       41  2:anomaly  2:anomaly       1
       42  2:anomaly  2:anomaly       1
       43   1:normal   1:normal       1
       44   1:normal   1:normal       1
       45   1:normal   1:normal       1
       46  2:anomaly  2:anomaly       1
       47  2:anomaly  2:anomaly       1
       48   1:normal   1:normal       1
       49  2:anomaly   1:normal   +   1
       50  2:anomaly  2:anomaly       1

=== Detailed Accuracy By Class ===

                 TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
                 0.986    0.039    0.967      0.986    0.976      0.948    0.973     0.960     normal
                 0.961    0.014    0.983      0.961    0.972      0.948    0.973     0.963     anomaly
Weighted Avg.    0.974    0.028    0.974      0.974    0.974      0.948    0.973     0.962

=== Confusion Matrix ===

     a     b   <-- classified as
 66389   954 |     a = normal
  2301 56329 |     b = anomaly

I think you'll have to clarify something for us. You keep asking how to plot this. However, you haven't described how you envision this plot appearing to the end user. What axes do you intend to use, what style of plot (say, graph vs histogram), et cetera. — Prune, Feb 18 '16 at 00:11
@Prune, I am sorry for that but I want to plot a graph as one of the following. All I wanted to plot is `anomaly vs. normal ` https://www.google.com/search?q=SVM+classification&biw=1440&bih=766&tbm=isch&tbo=u&source=univ&sa=X&ved=0ahUKEwjo6JKD77XKAhXrkXIKHROaB9sQsAQIPQ#tbm=isch&tbs=rimg%3ACdbcxyvUpMdlIjgYq01Ia4ffgaYcnN1DdvlIn8jdcWX5BPst51OmvolMKGPDOnPa7fH6G4A6Aefujktjb9GxIkiZnioSCRirTUhrh9-BEZtX4ldeByD9KhIJphyc3UN2-UgR9uTgU_1GWh1gqEgmfyN1xZfkE-xGOGDZ02A8sgSoSCS3nU6a-iUwoEbECmqMAfc_1-KhIJY8M6c9rt8foR5ONB3_1dzUWIqEgkbgDoB5-6OSxEF-rUr573v4CoSCWNv0bEiSJmeEQpLfLC9Vtot&q=SVM%20classification — , Feb 18 '16 at 00:23
I understand that you want to have, say, green circles for "normal" and red X's for "abnormal"; but what do you envision for the physical placement of these symbols? The images on this site are all 2D and 3D examples. You have 42 dimensions, 3 of which are classification items. I don't yet understand how "one of the following" maps into your data set. — Prune, Feb 18 '16 at 00:29
you are right. I was thinking of something like this http://stackoverflow.com/questions/8017427/plotting-data-from-an-svm-fit-hyperplane - in case it is possible to transform to 3d. — , Feb 18 '16 at 00:55

Prune · Accepted Answer · 2016-02-18T01:12:49.840

That output is the scoring function. Read the equals sign as a simple Boolean operator, evaluating to 1 for true, 0 for false. Thus, out of all the choices for a classification attributes, only one of the coefficients will affect the scoring value.

For example, let's consider only the first three attributes, with these normalized inputs and resulting values:

duration      2.0     -0.0498 * 2.0 => -0.0996
protocol_type icmp     0.1105
service       eco_i    1.1964

Note that the other protocol_type and service terms (such as

-0.6236 * protocol_type=udp

) have comparisons that evaluate to 0 (protocol_type=upd becomes 0), so those coefficients won't affect the overall sum.

From these three attributes, the score so far is the sum of these three terms, or 1.2073. Continue with the other 39 attributes, plus the constant -2.5266 at the end, and there's your vector's score.

Does that explain it well enough?

The critical phrase in the blog you cite is:

if the output of the scoring function is negative then the input is classified as belonging to class y = -1. If the score is positive, the input is classified as belonging to class y = 1.

Yes, it's that simple: implement that nice, linear scoring function (42 variables, 116 terms). Plug in a vector. If the function comes up positive, the vector is normal; if it comes up negative, the vector is an anomaly.

Yes, your model is significantly longer than the blog's example. That example is based on two continuous features; you have 42 features, three of which are classification features (hence the extra 73 terms). The example has 3 support vectors; yours will have 43 (N dimensions requires N+1 support vectors). However, even this 42-dimensional model operates on the same principle: positive = normal, negative = anomaly.

As for your desire to map to a 2-dimensional display ... it's possible ... but I don't know what you'd find meaningful in this instance. Mapping 42 variables to 3 creates a lot of congestion in our space. I've seen some nice tricks here and there, especially with gradient fields where the force vectors are in the same spatial interpretation as the data points. A weather map manages to represent x,y,z coordinates of a measurement, adding wind velocity (3D), cloud cover, and maybe a couple other metrics into the display. That's maybe 10 symbolic dimensions.

In your case, we could perhaps just drop the dimensions with coefficients smaller than 0.07 as being insignificant; that saves 6 features. The three classification features we could perhaps represent with color, dashed/dotted/solid symbol, and a tiny text overlay on the O or X (normal/anomaly data). That's 9 down without using Cartesian position (x,y,z coordinates, assuming the plot is meaningful in 3D).

However, I don't know your data nearly well enough to suggest where we might cram the remaining 33 features into 2 or 3 dimensions. Can you somehow combine any of those inputs? Does a linear combination of multiple features give you a result that is still meaningful in prediction?

If not, then we're stuck with the canonical approach: pick interesting combinations of features (usually pairs). Plot a graph for each, ignoring the other features entirely. If none of those make visual sense ... there's our answer: no we can't plot the data nicely. Sorry, but reality often does this to us in a complex environment, and we handle the data in tables, correlations, and other methods we can handle with our 3D minds.

Your answer is very clear. But the formula in this blog seems to confuse me. All i want is to plot an svm for anomaly vs. normal. Could you see how he done it in this blog and give me some hints? https://chrisjmccormick.wordpress.com/2013/04/16/trivial-svm-example/ — , Feb 17 '16 at 22:10
Because 2.0 was the value I posited for duration. The other attributes is chose are either 1 or 0, depending on the Boolean evaluation. All the features from src_bytes to the end would also have distinct values (not 0 or 1) — Prune, Feb 17 '16 at 22:36
Yes. For each row, you sum the terms of the scoring equation. Note that you need to normalize the input data. Does WEKA give you that transformation, too? If not, you can swot up some code to recompute it. — Prune, Feb 17 '16 at 23:06
Plotting depends on the visualization you need for this. You have 42 dimensions. How do you plan to represent this in two dimensions -- colour can help some, but that's a lot of variables to handle. "An SVM graph" is not a standard item. The examples on line are generally 2 or 3 dimensions. — Prune, Feb 17 '16 at 23:48
Glad to be of help. Remember to "accept" an answer so the question retires gracefully. As for "awesome", that's why SO is here: match problems with people who happen to know something about it. I'll just transfer that kudos to the site. :-) — Prune, Feb 18 '16 at 02:05

CAFEBABE · Answer 2 · 2016-02-17T21:56:11.420

1

Kind of why not completely different, but I guess it can solve your underlying problem. I assume you used the Weka Explorer to generate the model. If you go to the Classify tab, click on More options... and tick Output predictions. You get the probabilities of each classification. This should allow you to plot normal vs. abnormal

For iris I get something like

inst#,    actual, predicted, error, probability distribution
     1 3:Iris-vir 3:Iris-vir          0      0.333 *0.667
     2 3:Iris-vir 3:Iris-vir          0      0.333 *0.667
     3 3:Iris-vir 3:Iris-vir          0      0.333 *0.667
     4 3:Iris-vir 3:Iris-vir          0      0.333 *0.667
     5 3:Iris-vir 3:Iris-vir          0      0.333 *0.667
     6 1:Iris-set 1:Iris-set         *0.667  0.333  0    
     7 1:Iris-set 1:Iris-set         *0.667  0.333  0    
     8 1:Iris-set 1:Iris-set         *0.667  0.333  0    
     9 1:Iris-set 1:Iris-set         *0.667  0.333  0    
    10 1:Iris-set 1:Iris-set         *0.667  0.333  0

It contains the probability for each class.

edited Feb 17 '16 at 21:56

answered Feb 17 '16 at 19:33

CAFEBABE

3,983
1
19
38

I don't see choose or anything. I adde an extract for iris as reference. The only place where I see `choose` is for the classifier. – CAFEBABE Feb 17 '16 at 21:56
there is NO FN (False Negative) Evaluation metrics on the output and I don't know why. Any hints dear? – Feb 22 '16 at 00:49
It is there. You have two lines in the output. each having TP/FP for one class. If you take the FP for the first class you have FN for the other. This might be a bit confusing but due to the fact that it should generalise to multi-class. – CAFEBABE Feb 22 '16 at 06:20

SVM - scoring function

2 Answers2