1

I'm using Regex to grab a few prices from a HTML page. I have working strings for both £ and $ but as soon as I change it for Euros and place the currency symbol at the end of the regex string it doesn't seem to work.

Here's my code: preg_match('/([0-9]+[\.]*[0-9]*)\€/', $totalprice, $value);

Yet $value returns an empty array.

Thanks!

Mike Bell
  • 35
  • 1
  • 8
  • 1
    Do you save the file **UTF-8 encoded (without BOM)**? There should be a setting in your IDE. Only with the right encoding the _plain_ **€** symbol is saved correctly. Otherwise use `€` as suggested. – Markus Hofmann Aug 08 '13 at 14:58
  • **BTW:** Do spaces occur before the € sign in the source you're crawling? If so, add the `\s` in the regex. E.g. `/(\d+\.*\d*)\s?[€]/siu`. – Markus Hofmann Aug 08 '13 at 15:10

3 Answers3

4

This seems to be an issue with encoding. If it is acceptable, try using just the currency unicode symbol for the regex.

$totalprice = "595,95€";
preg_match('/((?:[0-9]*[.,])?[0-9]+)\p{Sc}/u', $totalprice, $value);
print_r($value);

phpFiddle

Daniel Gimenez
  • 18,530
  • 3
  • 50
  • 70
2

Add the u modifier to the end of your regex string to tell it to accept unicode characters.

preg_match('/([0-9]+[\.]*[0-9]*)\€/u', $totalprice, $value);
                                   ^
                                add this
Spudley
  • 166,037
  • 39
  • 233
  • 307
2

I'd use this regex:

'#(\d+[\.\,]\d*?)\s?[€]#su'

I replaced / with # for readability.

 
Parts of the regex explained:

  • \d           Matches digits (equal to [0-9], just shorter)

  • [\.\,]   Matches either . or , as the decimal separator

  • *?           Makes the * lazy, so the engine first attempts to skip the previous item, before trying                   permutations with ever increasing matches of the preceding item[1]

  • \s?         Matches space characters (? makes it optional)

 
The modifiers mean:

  • s   Match all characters, including newlines
  • i   Match caseless (case-insensitive)
  • u   Treat pattern strings as UTF-8 (for the € sign)
Community
  • 1
  • 1
Markus Hofmann
  • 3,427
  • 4
  • 21
  • 31
  • why would he want the `i` modifier if there's no alpha characters in the pattern? – Spudley Aug 08 '13 at 16:06
  • If the euro `€` sign is encoded or something like `€` it could be in different cases in the source like e.g. `&Euro;` or `€`. That's why I included it. – Markus Hofmann Aug 08 '13 at 18:22
  • well fair enough, but your pattern doesn't look for the entity string, just the symbol. (oh, and entities are case sensitive - `Ö` is different to `ö`. I think the Euro symbol has to be all lower case `€`) – Spudley Aug 08 '13 at 18:40
  • @Spudley You're right. I've created a [jsfiddle](http://jsfiddle.net/B3n5g/) to check it out with the € version. – Markus Hofmann Aug 08 '13 at 19:39