0

Part of my application requires validating user input. Some of that input must be inclusive in a list of available choices. For example let's take team names:

$array_one = ["Vikings","Eagles","Jaguars","Patriots"];
$array_two = [
    "Vikings"=>1,
    "Eagles"=>1,
    "Jaguars"=>1,
    "Patriots"=>1
]

//Validate using array 1
if(in_array($user_input,$array_one)) {
   echo "Valid Input";
}

//Validate using array2
if(array_key_exists($user_input,$array_two)) {
   echo "Valid Input";
}

With scenario A the arrays "look" cleaner and less hacky. in_array however has to do an O(n) array search. I believe array_key_exists() or isset($array[$key]) results in an O(1) lookup. The second way feels hacky because I'm actually using keys as values. Is the preferred method to structure my data sets\arrays like the second example to avoid using in_array() or in practice this doesn't matter?

Krypto234
  • 19
  • 3
  • That's a perfectly valid, non-hacky, often used way for quick look-up. What do you dislike about it? The quick answer to your answer is yes, though if the list is of size 4 it may not matter. Also consider "normalizing" the input (making it lower case for example, so ViKinGs is accepted as well). – kabanus Jan 21 '18 at 18:37
  • https://stackoverflow.com/questions/2473989/list-of-big-o-for-php-functions is not a duplicate of this question but you'll see this same question asked in the comments there. I agree 100% with @kabanus – JasonB Jan 21 '18 at 18:39
  • @kabanus - Thanks. If you want to make your response an answer I'll accept. The example has a list of size 4 but I like to design my code to scale as a similar data set could have 10,000 entries! I just think it looks ugly to define the array values as keys and put a fake value (1) in there. But if it's valid it's valid. – Krypto234 Jan 21 '18 at 18:40
  • With 10k records, the answer becomes less obvious, especially if the data doesn't start in the keyed form (like a DB result set). Keep in mind the overhead needed to flip or pull the keys from the values (an O(n) operation) and hash them, as well as the extra memory need to store this data structure. If you only need to use the lookup once per request, an array search will likely be faster. Consider using Redis to store a large set of lookup data so it doesn't need to be regenerated on each request. – Cy Rossignol Jan 21 '18 at 19:08

1 Answers1

0

The short answer is yes. If the list is of size 4 it won't matter, but for a scalable problem it's worth implementing now.

I'd like to note this is a very wide spread and common (and good!) way to implement a lookup, no reason to shy away from it. Hashes (arrays included) are exactly the right data structure for O(1) lookup. This also has the added advantage that you can store some extra data and retrieve it (instead of putting 1 in the array).

As a side note, normalizing the input is a good idea - making it lowercase so teams like "VikINgs" are accepted (since they usually are considered valid). Your array should be lowercase then as well. Perhaps also strip spaces etc.

kabanus
  • 24,623
  • 6
  • 41
  • 74
  • Good point about normalizing. I never thought of that. I guess I'll leave it up to the front end guys to capitalize it once I provide them a dump of the "acceptable values", i.e. to populate a drop down. – Krypto234 Jan 21 '18 at 19:16