1

I'm trying to assess a string based on the suffix of the files that it contains.

I need to differentiate between strings that contain only image files (.png,.gif, .jpg,.jpeg, or .bmp) and strings which contain a mixture of image and non-image files.

What am I doing wrong?

if (preg_match('~\.(png\)|gif\)|jpe?g\)|bmp\))~', $data->files)) {
  echo 'image only;'
} else {
  echo 'image + other types';
}

Example string containing a mixture:

filename 1 (https://example.com/test.pdf),
filename 2 (https://example.com/cool_image.jpg),
filename 3 (https://example.com/other-file.docx)

Example string containing only images:

filename 1 (https://example.com/another.png),
filename 2 (https://example.com/cool_image.jpg)
mickmackusa
  • 43,625
  • 12
  • 83
  • 136

3 Answers3

4

The regular expression is wrong. You have ) after each extension. This will work:

~\.(png|gif|jpe?g|bmp)~i

Complete example:

<?php
if (preg_match('~\.(png|gif|jpe?g|bmp)~i', "https://example.com/test.png")) {
  echo 'image only';
}
else {
  echo 'image + other types';
}

Demo

With the corrected regex, now you can check if the batch of files contains only images, images and files, or only files. We already got the first part down (checking if there are images). With this regex, we can check if there's non-images:

/^(?!.*[.](png|gif|jpe?g|bmp))(?:.*$|,)/im

It uses a negative lookahead to assert that the extensions are not matched in the line. At the end there's a non-capturing group to check for the end of line or a comma (to comply to your format).

So finally, check both regular expressions and see what each batch really contains:

$files=[
    'Non-Images Only'=>'filename 1 (https://example.com/test.exe)',
    'Mixed-Type'=>'filename 1 (https://example.com/test.pdf),
filename 2 (https://example.com/cool_image.jpg),
filename 3 (https://example.com/other-file.docx),
filename 4 (https://example.com/nice_image.png)',
    'Images-Only'=>'filename 1 (https://example.com/another.png),
filename 2 (https://example.com/cool_image.jpg))'];
foreach ($files as $type => $batch) {
    echo "Batch: ".$batch.PHP_EOL;
    echo "Expecting: ".$type.PHP_EOL;
    $images = preg_match('/\.(png|gif|jpe?g|bmp)/im', $batch);
    $nonImages = preg_match('/^(?!.*[.](png|gif|jpe?g|bmp))(?:.*$|,)/im', $batch);
    $result = "";
    if ($images && $nonImages) {
        $result = "Mixed-Type";
    }
    else {
        if ($images) {
            $result = "Images-Only";
        }
        else {
            $result = "Non-Images Only";
        }
    }
    echo "Result: ".$result.PHP_EOL;
    echo PHP_EOL;
}

Note: used @mickmackusa's list of tests

Demo

ishegg
  • 9,685
  • 3
  • 16
  • 31
  • @mickmackusa hadn't seen this. Can you further clarify the question? The regex successfully checks if the string contains an image extension (if file is actually an image is another question...) – ishegg Feb 25 '18 at 14:46
  • @mickmackusa I see. Will give it a go. – ishegg Feb 25 '18 at 14:49
  • Thanks for updating. Now, your answer is twice as slow as my pattern while having 4x as many upvotes than mine. This will surely confuse future researchers. Would you mind upvoting my answer? – mickmackusa Feb 25 '18 at 15:23
  • 1
    You need to back up your claim :). [Mine answer](https://3v4l.org/0ZsQP/perf#output), [your answer](https://3v4l.org/KAjAL/perf#output). They look to be just about the same. And how will the upvotes *confuse* researches when the accepted answer is yours? I certainly don't mind upvoting your answer though I don't personally like people asking for upvotes. – ishegg Feb 25 '18 at 15:29
  • I am talking in terms of steps. Click on my non-image regex demo, and add your pattern there. mine is 88 and yours is 167 Keep in mind, your answer earned 4 upvotes while being incorrect my answer didnt' prosper from the Upvote Pixies, I don't think I am out of line here. Future readers care about the green tick and vote tally. – mickmackusa Feb 25 '18 at 15:31
  • Your image file check is not as accurate/trustworthy as mine. If the `filename 1` (mystery component) contains a suffix (or a misinterpreted suffix), your pattern may fail. Do you see the increased accuracy in my pattern? That is the reason that I am matching the end of the string with the extra `)` and comma. This is also what the OP was doing in the question (misinterpreted as wrong) so it is reasonable to assume that it was purposely done. – mickmackusa Feb 25 '18 at 15:35
  • You're assuming a whole lot. I'll wait for OPs clarification to further improve the answer. – ishegg Feb 25 '18 at 15:41
  • What do you mean? I've given you your asked for upvote and I can't keep modifying this based on *your* interpretation of things. – ishegg Feb 25 '18 at 15:43
  • Only just received it after. Thought everything was sour. – mickmackusa Feb 25 '18 at 15:43
  • It didn't, I share your vision on this site. Just really think OP should clarify (just how many times have you answered a question just to have OP come a few hours after and explain he meant something completely different?) – ishegg Feb 25 '18 at 15:44
  • 1
    I've run out of fingers and toes. – mickmackusa Feb 25 '18 at 15:45
1

You're escaping your brackets, so they're getting treated literally.

The regex you're looking is simply: ~(\.png|gif|jpe?g|bmp)$~

if (preg_match('~(\.png|gif|jpe?g|bmp)$', $data->files)) {
  echo 'image only;'
}
else {
  echo 'image + other types';
}

Note that the $ at the end to denote the end of the string is critical; without it, any part of the string would be a valid match. As such, a file such as .jpg.exe would be considered an 'image'.

Running the regex (\.png|gif|jpe?g|bmp)$ against the strings:

https://example.com/test.pdf
https://example.com/other-file.docx
https://example.com/cool_image.jpg.exe
https://example.com/cool_image.jpg

Shows that only the final link will match.

This can be seen working here.

Note that you'll also probably want to throw the i modifier on the end of your regex to allow for file extensions in uppercase as well. This can be done with ~(\.png|gif|jpe?g|bmp)$~i.

Obsidian Age
  • 41,205
  • 10
  • 48
  • 71
  • "Your regex should be wrapped in forward slashes (/) rather than tildes (~)" why? – ishegg Feb 23 '18 at 02:20
  • I've removed that. I'm used to other languages where the delimeter matters, but according to PCRE it's fine to have a tilde :) – Obsidian Age Feb 23 '18 at 02:22
  • Are you talking about JS? I never noticed one was limited to using `/`. In PCRE you can use almost anything you'd like: "A delimiter can be any non-alphanumeric, non-backslash, non-whitespace character". – ishegg Feb 23 '18 at 02:24
  • @ObsidianAge may I see a case-insensitive pattern modifier please. (just for the sake of it) – mickmackusa Feb 23 '18 at 03:36
  • @mickmackusa - In order to have a case-insensitive modifier, you'd just need to throw an `i` modifier on to the end of the regex. So `~(\.png|gif|jpe?g|bmp)$~i` would work :) – Obsidian Age Feb 23 '18 at 03:43
  • (I know this much) It just felt like a sensible addition to your answer. – mickmackusa Feb 23 '18 at 03:44
  • Awesome, and fair enough. I've added that in :) – Obsidian Age Feb 23 '18 at 03:46
  • I have posted non regex solution , I need some suggestions if that is correct way or not – sumit Feb 23 '18 at 03:47
  • It would appear as though the question has been edited multiple times, and also was a little ambiguous to begin with. After carefully looking over the question again, it appears that it the OP has a *single* string that contains **multiple** files in the one string (with a bit of fluff). Your (accepted) solution does indeed cover that, though I still think this answer will help those who are looking for checking the file extensions in *individual* strings. – Obsidian Age Mar 02 '18 at 02:18
1

After reading and re-reading your question more than 20 times, I think I know what you are trying to do.

For every string (batch of files), I run two preg_match() checks. One that seeks files with a suffix of png,gif,jpg,jpeg, or bmp. Another that seeks files that DO NOT have a suffix in the aforementioned list.

*note: (*SKIP)(*FAIL) is a technique used to match and immediately disqualify characters in a pattern.

Code: (PHP Demo) (Image Pattern Demo) (Non-Image Pattern Demo)

$tests=[
    'Non-Images Only'=>'filename 1 (https://example.com/test.exe)',
    'Mixed-Type'=>'filename 1 (https://example.com/test.pdf),
filename 2 (https://example.com/cool_image.jpg),
filename 3 (https://example.com/other-file.docx),
filename 4 (https://example.com/nice_image.png)',
    'No Files'=>'filename 1 (),
filename 2 ()',
    'Images-Only'=>'filename 1 (https://example.com/another.png),
filename 2 (https://example.com/cool_image.jpg))'];

$image_pattern='~\.(?:png|gif|jpe?g|bmp)\),?$~im';
$non_image_pattern='~\.(?:(?:png|gif|jpe?g|bmp)(*SKIP)(*FAIL)|[^.)]+)\),?$~im';

foreach($tests as $type=>$string){
    echo "\t\tAssessing:\n---\n";
    echo "$string\n---\n";
    echo "Expecting: $type\n";
    echo "Assessed as: ";
    $has_image=preg_match($image_pattern,$string);
    $has_non_image=preg_match($non_image_pattern,$string);
    if($has_image){
        if($has_non_image){
            echo "Mix of image and non-image files";
        }else{
            echo "Purely image files";
        }
    }else{
        if($has_non_image){
            echo "Purely non-image files";
        }else{
            echo "No files recognized";
        }
    }
    echo "\n----------------------------------------------------\n";
}

Output:

        Assessing:
---
filename 1 (https://example.com/test.exe)
---
Expecting: Non-Images Only
Assessed as: Purely non-image files
----------------------------------------------------
        Assessing:
---
filename 1 (https://example.com/test.pdf),
filename 2 (https://example.com/cool_image.jpg),
filename 3 (https://example.com/other-file.docx),
filename 4 (https://example.com/nice_image.png)
---
Expecting: Mixed-Type
Assessed as: Mix of image and non-image files
----------------------------------------------------
        Assessing:
---
filename 1 (),
filename 2 ()
---
Expecting: No Files
Assessed as: No files recognized
----------------------------------------------------
        Assessing:
---
filename 1 (https://example.com/another.png),
filename 2 (https://example.com/cool_image.jpg))
---
Expecting: Images-Only
Assessed as: Purely image files
----------------------------------------------------
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
  • thank you so much. i’m so sorry that i couldn’t explained well. English is not my main language. I’m getting the data via Airtable’s api. Somehow I have to differentiate if the field contains non image links. –  Feb 23 '18 at 10:20
  • No worries. I'm happy to help. – mickmackusa Feb 23 '18 at 10:21