1

Just wondering how I can extract or match the specific file type, since there are a lot of malformed URLs and directories.

So I need a good regex to match only the real ones.

http://domain.com/1/image.jpg <-match .jpg
http://domain.com/1/image_1.jpg/.gif <-match first .jpg
http://domain.com/1/image_1.jpg/image.png <-match first .jpg
http://domain.com/1/image_1.jpg <-match .jpg
http://domain.com/1/image.jpg.jpeg <-match only the first .jpg
http://domain.com/1/.jpg <-not match
http://domain.com/.jpg.jpg <- not match
/1/.jpg <-not match
/.jpg.png <-match the first jpg
/image.jpg.png <-match the first jpg

I'm trying with this piece of code:

preg_match_all('([a-zA-Z0-9.-_](jpg))i', $url, $matches);

Any ideas?

Peter O.
  • 32,158
  • 14
  • 82
  • 96
greenbandit
  • 2,267
  • 5
  • 30
  • 44

2 Answers2

0
preg_match('(^(http://domain.com/\w.*?\.jpg))i', $url, $matches);

This will match everything from the start of the string up to the first .jpg. The filename part must start with a letter, number, or _.

Explosion Pills
  • 188,624
  • 52
  • 326
  • 405
0

Parsing URLs with regular expressions is usually a bad idea. See Getting parts of a URL (Regex) for a related question. In particular, look at this answer, then realize that parse_url might be a good start. Take $result['path'] and use a file name parsing API on it to extract the extension.

I'm not sure exactly what you are asking for though.

http://domain.com/1/image_1.jpg/.gif <-match first .jpg
http://domain.com/1/image_1.jpg/image.png <-match first .jpg

In both of these cases image_1.jpg is a perfectly valid directory name. You could split the path on '/' and check each one for "validity".

Edit I just noticed that you need this to work with relative URLs as well. parse_url does not work well in that case.

Community
  • 1
  • 1
D.Shawley
  • 58,213
  • 10
  • 98
  • 113