1

This was my error:

Fatal error: Uncaught TypeError: Phpml\CrossValidation\Split::__construct(): Argument #1 ($dataset) must be of type Phpml\Dataset\Dataset, array given, called in C:\xampp\htdocs\490\testing2.php on line 23 and defined in C:\xampp\htdocs\490\vendor\php-ai\php-ml\src\CrossValidation\Split.php:32 Stack trace: #0 C:\xampp\htdocs\490\testing2.php(23): Phpml\CrossValidation\Split->__construct(Array, Array, 0.3) #1 {main} thrown in C:\xampp\htdocs\490\vendor\php-ai\php-ml\src\CrossValidation\Split.php on line 32

I am trying to use php-ml linear regression to predict the price of this NYC 2019 airbnb dataset. I removed all null values and am left with columns in the order: index, room_type, price, minimum_nights, number_of_reviews, neighbourhood group. My target is price and for now I am just using minimum nights and number of reviews as my predictors. This is my code:

<?php

require 'vendor/autoload.php';

use Phpml\Regression\LeastSquares;
use Phpml\Metric\Regression;
use Phpml\CrossValidation\RandomSplit;
use Phpml\Dataset\CsvDataset;

// Load the data
$dataset = new \Phpml\Dataset\CsvDataset(filepath: "./data/NYC2019datareduced.csv", features: 5,       headingRow: true);

$samples = [];
$targets = [];

foreach ($dataset->getSamples() as $sample) {
    // Selecting the features (excluding the target variable 'price')
    $samples[] = [$sample[3], $sample[4]];
    $targets[] = $sample[2];
}

// Split the data into training and testing sets
$randomSplit = new \Phpml\CrossValidation\RandomSplit($samples, $targets, 0.3);

// Train the model
$regression = new \Phpml\Regression\LeastSquares();
$regression->train($randomSplit->getTrainSamples(), $randomSplit->getTrainLabels());

// Test the model
$predicted = $regression->predict($randomSplit->getTestSamples());
$r2 = \Phpml\Metric\Regression::r2Score($randomSplit->getTestLabels(), $predicted);

echo "R2 score: $r2\n";

// Predict the price for a new listing
$newListing = [1,45]; // Replace with your own data
$predictedPrice = $regression->predict([$newListing]);

echo "Predicted price for the new listing: $predictedPrice[0]\n";

Looking at the error message, my focus should be on line 23 with this code: $randomSplit = new \Phpml\CrossValidation\RandomSplit($samples, $targets, 0.3);

Originally it was $randomSplit = new RandomSplit($samples, $targets, 0.3); but I changed the syntax. I also tried to see if maybe I should replace samples and targets in the parenthesis with $dataset, but I'm not having any luck. .

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Arshado
  • 41
  • 5
  • 2
    the error message is quite clear. pass dataset as parameter please – Ken Lee May 09 '23 at 14:10
  • When I change it to: $randomSplit = new \Phpml\CrossValidation\RandomSplit($dataset, 0.3); I get the following error: – Arshado May 09 '23 at 14:12
  • @KenLee TypeError: Unsupported operand types: int * string in C:\xampp\htdocs\490\vendor\php-ai\php-ml\src\Math\Matrix.php:159 Stack trace: #0 C:\xampp\htdocs\490\vendor\php-ai\php-ml\src\Regression\LeastSquares.php(73): Phpml\Math\Matrix->multiply(Object(Phpml\Math\Matrix)) #1 C:\xampp\htdocs\490\vendor\php-ai\php-ml\src\Regression\LeastSquares.php(39): Phpml\Regression\LeastSquares->computeCoefficients() #2 C:\xampp\htdocs\490\testing2.php(27): Phpml\Regression\LeastSquares->train(Array, Array) #3 {main} thrown in C:\xampp\htdocs\490\vendor\php-ai\php-ml\src\Math\Matrix.php on line 159 – Arshado May 09 '23 at 14:13
  • 1
    Please look at the class you're trying to use and see what arguments and their types it expects. Looking at the original: `RandomSplit($samples, $targets, 0.3);` and your updated version: `RandomSplit($dataset, 0.3);`, you've changed more than just the first argument. You are now passing in `0.3` as the second argument instead of the third (which is what the new error complaints about) – M. Eriksson May 09 '23 at 14:15
  • @M.Eriksson going into the vendor file and looking at the randomsplit.php file included, I assume it expects just two arguments. The dataset and the test size looking at: class RandomSplit extends Split { protected function splitDataset(Dataset $dataset, float $testSize) so now I am a little confused if I need two or three arguments and what to do moving forward. – Arshado May 09 '23 at 14:24
  • `protected function splitDataset(...)` has nothing to do with errors you've shared. You get an error when you [instantiate](https://www.php.net/manual/en/language.oop5.basic.php) the class, so it's the class [constructor](https://www.php.net/manual/en/language.oop5.decon.php) you should look at. If you don't have any `__construct(...)` in that class, then look at the class `Split`, since `RandomSplit` [extends](https://www.php.net/manual/en/language.oop5.inheritance.php) that class. – M. Eriksson May 09 '23 at 14:32
  • 1
    Please refer to [this link](https://php-ml.readthedocs.io/en/latest/machine-learning/cross-validation/random-split/) for the number of parameters in RandomSplit. Choose the one which suits your case. – Ken Lee May 09 '23 at 14:50
  • this is some info from the split class: abstract class Split { /** * var array */ protected $trainSamples = []; /** * var array */ protected $testSamples = []; /** * var array */ protected $trainLabels = []; /** * var array */ protected $testLabels = []; public function __construct(Dataset $dataset, float $testSize = 0.3, ?int $seed = null) so looking at the class constructor I don't necessarily need a seed but looks like the syntax is right with dataset and testsize? @M.Eriksson – Arshado May 09 '23 at 15:30
  • 1
    I think there is no need to further consider the line of Randomsplit (line 23). Now you have already changed the 1st parameter of Randomsplit() from $samples to $dataset and the third parameter seems optional (I have not checked but if I design this module it will be...). Now the system prompts other system messages . At first glance it is line 27 which concerns train(). Please check whether the two parameters in the **train** method contain correct values and are of proper types. (In short, declare the right class you want in PHP-ML, train correctly and then perform predict) . Thanks – Ken Lee May 09 '23 at 15:59
  • @KenLee thank you. Okay so now I am convinced. When the least squares regression is trying to train the model with my data, it could be that I called a categorical variable. In my csv dataset, my columns are in order from L to R: index, room_type, price, minimum_nights, number_of_reviews, neighbourhood group. Where room type and neighborhood group are categorical. I am just focusing on numerical data. Am I doing something wrong with my foreach loop? – Arshado May 09 '23 at 17:04
  • 1
    In your code there is only one foreach loop, I believe you are trying to populate the arrays ($samples and $targets) with the data in the CSV file, that's fine -- provided that the CSV file contains the right type of data. (but since I do not have your CSV file I cannot further comment). There is nothing wrong to focus on numerical data. Actually ML (and many things in computing) usually relies on numerical data to process. – Ken Lee May 09 '23 at 17:15
  • 1
    On further point --- "TypeError: unsupported operand type(s) * int * string **may** occur when you try to perform mathematics with a string which should be a number instead. So please also check what type of data you are feeding to train() – Ken Lee May 09 '23 at 17:19
  • Thank you @KenLee I was preprocessing the data on a jupyter notebook, price: third column [2], minimum nights: fourth column [3], and number of reviews: fifth column [4] are all int64 – Arshado May 09 '23 at 17:38
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/253573/discussion-between-arshado-and-ken-lee). – Arshado May 09 '23 at 18:37
  • Error has been fixed and code is working with the following changes: $samples[] = [(int)$sample[3], (int)$sample[4]]; $targets[] = (int)$sample[2]; and following that I made a new dataset with the code: $dataset = new \Phpml\Dataset\ArrayDataset($samples,$targets); Thank you everyone for the support! – Arshado May 09 '23 at 19:08

0 Answers0