-1

So i have a project as a part of a final exam in which i have to create and train some models to detect malicious executables based on data mining and machine learning techniques. I have a dataset of 14998 samples grouped on two tables of 14998x543(features) and one 14998x1(classes of those samples).

I wrote some data arrangement code but when i tried to use that on the knn classiffier i got some weird errors.Hoping someone here can help as im new to matlab syntax.

Here is my code:

clear all

close all

clc

load ('C:\Users\Ζαρο-PC\Documents\MATLAB\PatRec Project\DataMist.mat');   

load ('C:\Users\Ζαρο-PC\Documents\MATLAB\PatRec Project\DataMistClasses.mat');;    


inds= randperm(size(Dataset,1));

training = Dataset(inds(1:10000),:);

train_classes = DatasetMistClasses(inds(1:10000),:);

testing = Dataset(inds(10001:end),:);

test_classes = DatasetMistClasses(inds(10001:end),:);


c= knnclassify(testing,training,train_classes);


cp = classperf(c,test_classes);

cp.CorrectRate

And these are the following errors...:

Error using statslib.internal.grp2idx (line 44) You cannot subscript a table using linear indexing (one subscript) or multidimensional indexing (three or more subscripts). Use a row subscript and a variable subscript.

Error in grp2idx (line 28) [varargout{1:nargout}] = statslib.internal.grp2idx(s);

Error in knnclassify (line 86) [gindex,groups] = grp2idx(group);

Error in PatternRegognitionLabProject (line 19) c= knnclassify(testing,training,train_classes)

Really hope someone solves this as i busted my brain open trying to fix it. Thanks in advance, Dimitris

CASE CLOSED

DimZ
  • 3
  • 4

1 Answers1

0

I cannot see anything wrong with your code. I have reproduced your example with random numbers as data in Matlab 2015a and it worked correctly:

Dataset = rand(14998, 543);
DatasetMistClasses = randi(2, 14998, 1);

inds = randperm(size(Dataset,1));
training = Dataset(inds(1:10000), :);
train_classes = DatasetMistClasses(inds(1:10000), :);

testing = Dataset(inds(10001:end), :);
test_classes = DatasetMistClasses(inds(10001:end), :);

c = knnclassify(testing,training, train_classes);

cp.CorrectRate

>> cp
                        Label: ''
                  Description: ''
                  ClassLabels: [2x1 double]
                  GroundTruth: [4900x1 double]
         NumberOfObservations: 4900
               ControlClasses: 2
                TargetClasses: 1
            ValidationCounter: 1
           SampleDistribution: [4900x1 double]
            ErrorDistribution: [4900x1 double]
    SampleDistributionByClass: [2x1 double]
     ErrorDistributionByClass: [2x1 double]
               CountingMatrix: [3x2 double]
                  CorrectRate: 0.511632653061225
                    ErrorRate: 0.488367346938776
              LastCorrectRate: 0.511632653061225
                LastErrorRate: 0.488367346938775
             InconclusiveRate: 0
               ClassifiedRate: 1
                  Sensitivity: 0.517758484609313
                  Specificity: 0.505071851225697
      PositivePredictiveValue: 0.528393072895691
      NegativePredictiveValue: 0.494414563508482
           PositiveLikelihood: 1.046128586324198
           NegativeLikelihood: 0.954797845535033
                   Prevalence: 0.517142857142857
              DiagnosticTable: [2x2 double]

>> cp.CorrectRate

ans =

   0.511632653061225

Maybe you the type of data you are using is messing with the knn function. Review how your data looks like and see if maybe the shape or type of data is not as expected/intended.

Good luck!

TitoOrt
  • 1,265
  • 1
  • 11
  • 13