0

I have a list of Operating systems. If someone enters something like "Ubuntu", I would like to correct that to "Linux Ubuntu". I have various other corrections like this and I'm wondering if there is an efficient way to go through an array making all these corrections?

I was thinking of having an associative array with name and key pairs; the key being the "from" field and the name being the "to". Is there a better way to do this more efficiently?

Sample array:

$os = array('Ubuntu', 'VMWare', 'CentOS', 'Linux Ubuntu');

The above values is just an example of some of the data. But essentially some of them will be correct, some will not be though, and they will need to be corrected.

ComputerLocus
  • 3,448
  • 10
  • 47
  • 96

2 Answers2

0

What about using array preg_grep ( string $pattern , array $input [, int $flags = 0 ] )[1] with some kind less or more sophisticated regular expression? You might need simple array of corrected (like Linux Ubuntu) values for that.

EDIT: Code example for crystal clearance:

$regex = '/^[a-Z ]*' . $user_input . '[a-Z ]*$/';
$correct_values = {"Linux Ubuntu", "Linux Debian", "Windows XP", ...}; //const
$corrected_value = preg_grep($regex, $correct_values); 

[1] http://php.net/manual/en/function.preg-grep.php

Kamiccolo
  • 7,758
  • 3
  • 34
  • 47
  • I don't think this will work. I don't want to have a huge pattern. Say I have 100 corrections, well that would be one big regex statement that is hard to understand. – ComputerLocus Aug 12 '13 at 17:45
  • I know that is what you meant, but that does not solve the issue with having a huge regex. – ComputerLocus Aug 12 '13 at 17:53
  • No, I meant something simple like this: $regex = '/^[a-Z ]*' . $user_input . '[a-Z]*$'; $correct_values = {"Linux Ubuntu", "Linux Debian", "Windows XP", ...}; //const $corrected_value = preg_grep($regex, $correct_values); Ghhh... for readability: http://paste.ubuntu.com/5978191/ – Kamiccolo Aug 12 '13 at 17:56
  • @Kamiccolo this will only work when the input (the "wrong value") is a substring of the correct value. what if correction is "LNX" => "Linux" ? Besides, invoking regex here might not be best choice. – poncha Aug 12 '13 at 18:01
  • Having an array for ALL possible misspelled Operating Systems doesn't sound like an efficient choice :) Maybe using something like Levenshtein Distance ( http://php.net/manual/en/function.levenshtein.php )... There is also a thread for simmilar problem: http://stackoverflow.com/questions/3939994/how-to-find-a-similar-word-for-a-misspelled-one-in-php – Kamiccolo Aug 12 '13 at 18:05
  • @poncha you're correct. I actually already have it checking for values close to other values. So LNX would be corrected to Linux. But before I do these checks, I need to make sure that the values are correctly reformatted. I don't want to do a check on Ubuntu, which is being corrected to Solaris, as it is the closet to it. That is why I am looking to filter the array and try and prevent this. – ComputerLocus Aug 12 '13 at 18:06
  • @Kamiccolo I already am using this, but Ubuntu gets corrected to "Solaris", not "Linux Ubuntu". – ComputerLocus Aug 12 '13 at 18:06
  • 1
    @Fogest lookup soundex... that will eliminate some false positives levenstein distance gains – poncha Aug 12 '13 at 18:21
  • 1
    @poncha They are not really false positives. If something is typed incorrectly it should be corrected to the closet match, however if it matches one of the common incorrect values then it will be corrected. – ComputerLocus Aug 12 '13 at 18:49
0

I have solved my question by going through the array checking for matching key and name pairs. The only problem I am experiencing is that the dots in the converted strings are gone and being replaced with spaces.

$commonCorrections = array("Ubuntu" => "Linux Ubuntu", "Ubuntu-12.04" => "Linux Ubuntu-12.04", "Ubuntu-10.10" => "Linux Ubuntu-10.10");

for($i = 0;$i < count($groups);$i++){
    foreach($commonCorrections as $key=>$correction){
        if(strtolower($key) == trim(strtolower($groups[$i]))){
            $groups[$i] = $correction;
        }
    }
}
ComputerLocus
  • 3,448
  • 10
  • 47
  • 96