I am using the Php-ai/Php-ml framework. In the example they give, the AI only uses one feature which is not helpful but on the main git page, they also give this example of using more than one feature:
$samples = [[1, 3], [1, 4], [2, 4], [3, 1], [4, 1], [4, 2]];
$labels = ['a', 'a', 'a', 'b', 'b', 'b'];
$classifier = new KNearestNeighbors();
$classifier->train($samples, $labels);
echo $classifier->predict([3, 2]);
Based on the example supplying only one feature, and the secondary example supplying two. I tried to recreate this:
I am trying to recreate this using two features. My current code snippet looks like this:
public function train(Request $request) {
# CSV File
$file = $request->file('dataframe');
# Features + 1 will be the labels column
$dataset = new CsvDataset($file, (int) $request->features);
$vectorizer = new TokenCountVectorizer(new WordTokenizer());
$tfIdfTransformer = new TfIdfTransformer();
$finalSamples = [];
for($i = 0; $i <= $request->features -1; $i++):
$samples = [];
foreach ($dataset->getSamples() as $sample)
$samples[] = $sample[$i];
$vectorizer->fit($samples);
$vectorizer->transform($samples);
$tfIdfTransformer->fit($samples);
$tfIdfTransformer->transform($samples);
$finalSamples[] = $samples;
endfor;
# This gives us an output of Array[ 0 => [Feature 1, Feature 2], 1 => [Feature 1, Feature 2], ... ] like shown on example two.
$result = [];
foreach($finalSamples as $arr)
foreach($arr as $k => $v)
$result[$k][] = $v;
$dataset = new ArrayDataset($result, $dataset->getTargets());
$randomSplit = new StratifiedRandomSplit($dataset, 0.1);
$classifier = new SVC(Kernel::RBF, 10000);
# Train with half of the data frame
$classifier->train($randomSplit->getTrainSamples(), $randomSplit->getTrainLabels());
$predictedLabels = $classifier->predict($randomSplit->getTestSamples());
$inputLabels = $randomSplit->getTestLabels();
}
My CSV file looks like this:
"SibSp","Parch","Survived",
"1", "1", "1",
"3", "3", "1",
"4", "1", "0"
"4", "0", "1",
"5", "2", "0"
"3", "1", "0",
"2", "2", "1",
"0", "0", "1"
The problem now is when I visualise the data, which I do like so:
$newDataFrame = [];
$incorrect = 0;
for($i = 0; $i <= count($inputLabels) -1; $i++):
$newDataFrame[] = (object) ['input' => $inputLabels[$i], 'output' => $predictedLabels[$i]];
if($inputLabels[$i] != $predictedLabels[$i]) $incorrect++;
ndfor;
$correct = count($inputLabels) - $incorrect;
$score = round((float)Accuracy::score(isset($request->train) ? $randomSplit->getTestLabels() : $inputLabels, $predictedLabels) * 100 );
The data always comes out as 1 correct, 1 incorrect and sits at a score of 50 (%).
How can I use this Classifier to use multiple features rather than just one? I think the problem is when building the ArrayDataSet
but I have no clue what is wrong with it.