0

I have an error appears when I execute my classe : PHP Warning: A non-numeric value encountered in /php-ai/php-ml/src/Regression/LeastSquares.php

What I tried to do it's to scrap some price for a product and use a regression approach to make a prediction. Currently it's a first approach to test and after I refactor this function

I do not find exactly what's happen with this error.

maybe I make a mistake somewhere

My function

use Phpml\Dataset\ArrayDataset;
use Phpml\Regression\LeastSquares;
use Phpml\CrossValidation\RandomSplit;
use Phpml\Metric\Accuracy;
use Phpml\Math\Average;
use Phpml\Math\StandardDeviation;
use Phpml\NeuralNetwork\MultilayerPerceptron;


function predict_price($data, $data_type) {
    try {
        // Validate input parameters
        if (!is_array($data)) {
            throw new InvalidArgumentException('Data parameter must be an array');
        }

        if (!in_array($data_type, ['csv', 'array', 'database'])) {
            throw new InvalidArgumentException('Invalid data type: ' . $data_type . '. Allowed data types: csv, array, database');
        }

        // create a new linear regression model
        $regression = new LeastSquares();

        // load the dataset
        if ($data_type === 'array') {
            if (!isset($data['samples']) || !is_array($data['samples'])) {
                throw new InvalidArgumentException('Data array parameter must contain a "samples" array');
            }

            if (!isset($data['labels']) || !is_array($data['labels'])) {
                throw new InvalidArgumentException('Data array parameter must contain a "labels" array');
            }

            $samples = array_map(function($sample) {
                if (!is_array($sample) || count($sample) != 2) {
                    throw new InvalidArgumentException('Each sample in the "samples" array must be an array with 2 values');
                }
                $timestamp = strtotime($sample[0]);
                if ($timestamp === false) {
                    throw new InvalidArgumentException('Invalid timestamp: ' . $sample[0] . '. Timestamp should be in the format yyyy-mm-dd');
                }
                return [$timestamp, $sample[1]];
            }, $data['samples']);

            $labels = $data['labels'];
            $dataset = new ArrayDataset($samples, $labels);
        } 

        // split the dataset into training and testing sets
        $randomSplit = new RandomSplit($dataset, 0.3);
        $trainingSamples = $randomSplit->getTrainSamples();
        $trainingLabels = $randomSplit->getTrainLabels();
        $testingSamples = $randomSplit->getTestSamples();
        $testingLabels = $randomSplit->getTestLabels();

        // train the model on the training set
        $regression->train($trainingSamples, $trainingLabels);

        // make predictions on the testing set
        $predictions = $regression->predict($testingSamples);

        // evaluate the performance of the model
        $metric = new Accuracy();
        $accuracy = $metric->score($testingLabels, $predictions);

        // calculate the confidence of the prediction
        $confidence = $regression->getCoefficients()[0];

        // return the accuracy of the model, the prediction for a new data point, and the confidence of the prediction
        $newSample = [date('yyyy-mm-dd'), 1];
        $newPrediction = $regression->predict([$newSample]);

        return array('accuracy' => $accuracy, 'prediction' => $newPrediction[0], 'confidence' => $confidence);
    } catch (InvalidArgumentException $e) {
        // handle input validation errors
        return array('error' => 'Invalid input: ' . $e->getMessage());
    } catch (Exception $e) {
        // handle other exceptions and return an error message
        return array('error' => 'Unexpected error: ' . $e->getMessage());
    }
}

and below my example to execute :

$historical_prices = array(
    array('2022-01-01', 10),
    array('2022-01-02', 11),
    array('2022-01-03', 12),
    array('2022-01-04', 13),
    array('2022-01-05', 14),
    array('2022-01-06', 15),
    array('2022-01-07', 16),
    array('2022-01-08', 17),
    array('2022-01-09', 18),
    array('2022-01-10', 19),
);

$data = array(
    'samples' => $historical_prices,
    'labels' => array(11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
);

$result = predict_price($data, 'array');
echo 'Accuracy: ' . $result['accuracy'] .  "<br>";
echo 'Predicted price: ' . $result['prediction']  . "<br>";
echo 'Confidence: ' . $result['confidence']  . "<br>";
echo '-------<br>';

result :

Accuracy: 0 Predicted price: 2.9258048044576 Confidence: 1.1612428352237E-8

Mark
  • 1
  • 1
  • 1
    That is an awful lot of code to parse through, and then the error happens in an external library. You should reduce the code to the minimum required to actually reproduce the error and post that instead. – Sammitch Apr 14 '23 at 19:41
  • 1
    Also, just from a structural standpoint, that function does _way_ too much. Validating and parsing input in different formats, instatiating utility objects, etc. Consider breaking this out into multiple functions at the least, ideally encapsulated in one or more classes. – Sammitch Apr 14 '23 at 19:45
  • I agree that's the goal. but I do not understand, the data provided is correct and I have a result – Mark Apr 14 '23 at 19:48
  • I removed the csv and database for better understanding – Mark Apr 14 '23 at 19:50

0 Answers0