2

I'm in PHP. I'd like to find numbers in a sentence that start with a currency symbol, and returns the number. To search "I spent €100 on shoes" and return "100".

I've got this working for $ and £:

'/[$£]([0-9.]{1,})/'

But adding the € euro symbol doesn't work. (The sentences come from parsed emails, so I don't need to find €);

preg_match_all('/[€]([0-9.]{1,})/', $sentence, $match);

I've found the following on SO: regex for currency (euro) But it doesn't encode the euro symbol.

To encode the euro symbol, I've tried:

/[\x{20ac}]([0-9.]{1,})/u
"[^-a-zA-Z0-9.:,!+£$ \\ ". chr(164) ."]"

But can't figure it out. Any help?

Community
  • 1
  • 1
Corey
  • 1,977
  • 4
  • 28
  • 42
  • 1
    I think you want the `u` modifier, to enable UTF-8 patterns. See http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php – tjm Jul 05 '11 at 16:34
  • You may want to also look for entities representing euro symbol - `€` and `€` – dev-null-dweller Jul 05 '11 at 16:37
  • @tjm adding /u breaks the regex I have: "Compilation failed: invalid UTF-8 string at offset 2" @dev-null-dweller Edited. I only have € because the text comes from emails and user input. I'm not changing € to € – Corey Jul 05 '11 at 16:43

2 Answers2

1

When I put this in:

 echo preg_match("#€[0-9]{1,}#", "€1" )?1:0;

I get 1, so you might not need unicode. But if you would like to use UTF-8 nevertheless, I found this as a comment under the PHP docs.

function unichr($u) {
    return mb_convert_encoding('&#' . intval($u) . ';', 'UTF-8', 'HTML-ENTITIES');
}

To get the €, you call unichr(8364). Use that in place of the euro sign above and you'll be good. (I feel I should note: that I tested both as the unicode version:

preg_match("#".unichr(8364)."\s*([0-9]{1,})#u", unichr(8364). "1" )?1:0;

You might want to do str_replace('€', unichr(8364), $str); first...

PS. You probably also want to allow for spaces and decimals: #€\s*([0-9]{1,}(\.?[0-9]{2}))#

cwallenpoole
  • 79,954
  • 26
  • 128
  • 166
  • This worked once I got everything into UTF-8. Turned out the problems were further upstream. Thanks! – Corey Jul 06 '11 at 18:14
-1

How about you replace the euro symbol with something else? E.g:

$str = 'I spent €100 on shoes.';
$tempStr = str_replace('€', '$', $str);
//$tempStr now contains: I spent $100 on shoes.

preg_match_all('/[€]([0-9.]{1,})/', $tempStr, $match);
Ali
  • 261,656
  • 265
  • 575
  • 769