I'm not too familiar with MATCONVNET in particular so I may be misinterpreting your issue, but a general rule for evaluating classifiers is cross-validate (i.e. train and test on different subsets and repeat across multiple subset configurations).
For one configuration
To generate one configuration of data you can use
nObservations = size(data,1);
y_act = ... ; % class labels for each observation
ratio = .5;
net = ... %your net definition
[Train,Test] = crossvalind('HoldOut',nObservations,ratio);
[net,...] = train(net,data(Train,:))
y_exp = predict(net,data(Test,:))
rate = length(find(y_exp == y_act(Test)))/...
numel(y_act(Test));
Rate is a better indicator than your original one. Depending on your model, you probably want to make sure that the classes are distributing at an even proportion (or maybe even equal amounts, which would require you to toss some observations so you have equal class totals). To ensure this you can divide each class independently with crossvalind
and then merge to form your Train/Test sets.
You can also play around with your train:test ratio. .5 is a fine place to start. Gradually increase if you have trouble classifying your test set.
Permute configurations
One issue with this analysis is only evaluates one configuration. For massive datasets or sets where generalization is not important that might be ok. But you can combat this by permutating the above analysis across
all combinations within a set configuration
Say you divide the dataset into A,B,C, and D. You can train([A,B,C])-test(D), train(A,B,D)-test(C), ... train(B,C,D)-test(A) and average the prediction rates
This is usually 'good enough' but you have to acknowledge that in small datasets you could have a skewed representations in one of your sets
all possible configurations within a set
Using nchoosek(...)
you can represent all possible configurations and test each one for train-test. Becomes impossibly long very quickly. Choosing an arbitrary, large number of these configurations works too. If you have really low observation numbers this is useful at a low costs,especially when validated with bootstrapping. Probably not relevant for your analysis.