1

I have a very simple MATLAB program for training and testing a regression tree, I use the same Data carsmall that is in the tutorial examples:

clear all 
clc
close all

load carsmall

X = [Cylinders, Weight, Horsepower, Displacement];
Y = MPG;
tree = fitrtree(X,Y, 'PredictorNames',{'Cylinders', 'Weight', 'Horsepower', 'Displacement'},'ResponseName','MPG','MinLeaf',10); 
Xtest=[6,4100,150,130];
MPGest = predict(tree, Xtest);

This gives as a result MPGest=14.9167

I want to know how the predict function is arriving at that value, usually to understand I go line by line inside the function. This one is very tricky because uses classes so I arrive at this line

node = findNode(this.Impl,X,this.DataSummary.CategoricalPredictors,subtrees);

and inside that function I arrive to

            n = classreg.learning.treeutils.findNode(X,...
            subtrees,this.PruneList,...
            this.Children',iscat,...
            this.CutVar,this.CutPoint,this.CutCategories,...
            this.SurrCutFlip,this.SurrCutVar,...
            this.SurrCutPoint,this.SurrCutCategories,...
            verbose);

when I try to step in at this step it just give me n=10, how is MATLAB arriving at this number? for example, If I wanted to make my own program to calculate this number using the tree object as input without using predict?

Diego Fernando Pava
  • 899
  • 3
  • 11
  • 24

1 Answers1

0

Actually, the function you are looking for is defined into a MEX file. If you try to open it using the open function, you will get the following outcome:

open 'classreg.learning.treeutils.findNode'

Error using open (line 145) Cannot edit the MEX-file 'C:...\toolbox\stats\classreg+classreg+learning+treeutils\findNode.mexw64'

Unfortunately, MEX files are compiled from C++ sources into bytecode (better known as assembly language). There are many decompilers out there that you can use in order to rebuild the instructions compiled into the library (this is a nice starting point, if you want a fast overview), and the whole process is feasible especially because the file itself is quite small.

The code you will get back will not be the original source code, but something similar: variables will have a default and meaningless name, there will be pointers all over and it may also contain bugs due to a wrong reversing of some assembly instructions. This will be probably enough to let you understand what's going on, but you will not be able to follow the computations through a step by step debugging session in Matlab.

Now, the only question you have to answer to is: is this worth the effort?

Tommaso Belluzzo
  • 23,232
  • 8
  • 74
  • 98
  • My question may have been misunderstood. I am not trying to do reverse engineering on the mex program, I just want to understand how the predict function i arriving at that particular prediction. how can I use the info stored in tree plus the test data and arrive at that particular prediction. I am just looking for the theory behind it and how to use the info that is already there in the tree struct. – Diego Fernando Pava Feb 14 '18 at 14:51