1

This Regex below works well for all currencies when testing on a Regex test page.

Although when in my page, it just works for the $ currency.

I changed the MetaTag for different caracter sets, no luck.

What could be wrong?

if (preg_match_all('([£€$¥]([ 0-9]([ 0-9,])*)(\.\d{2})?|([0-9]([0-9,]))(\.\d{2})?([pcm]|bn| [mb]illion))', $tout, $matches))

I want to extract any amount following (with ou without a space) any of the 4 Currencies signs.

Sergelie
  • 363
  • 1
  • 14
  • 1
    Can you show some example values of `$tout` that you're working on? – Amal Murali Jan 03 '14 at 20:40
  • Absent further info, i'd assume it's an encoding issue. Your expression probably isn't matching because the other currency symbols in the text you're trying to match are using different character codes than they do in your script. If the input is UTF-8, try adding the `u` modifier to your pattern and ensuring that your script is in UTF-8 as well. – cHao Jan 03 '14 at 20:45
  • This would not be extrated : ¥ 18.00 Chicken with Pinapple – Sergelie Jan 03 '14 at 20:46
  • This would be be extracted : $ 18.00 Chicken with Pinapple – Sergelie Jan 03 '14 at 20:47
  • Try escaping the dollar (reserved character) like `\$` or it is potentially a character encoding issue with the other currency symbols (try UTF-8 values for them? – Sam Jan 03 '14 at 20:48
  • cHao I am using in my page content="text/html; charset=iso-8859-1", I tested utf-8, same issue. – Sergelie Jan 03 '14 at 20:49
  • 1
    @SergeSf: Doesn't really matter what encoding your page is using, unless that's where the input is coming from. What matters is the encoding of the script itself, and of the input. (BTW, if you have a choice, you might want to prefer UTF-8, for a number of reasons just like this.) – cHao Jan 03 '14 at 20:50
  • cHao, I understand. The input is comming from the user page (html). I do not know how to encode my script .php page. – Sergelie Jan 03 '14 at 20:51
  • @SergeSf: That'd be something you set in whatever you use to edit your scripts. Any decent editor should have an option to set the encoding. (You may have to edit the page after you switch the encoding, depending on how your editor handles the change. But any ASCII characters will normally be fine; it's just stuff like your other currency symbols that'd need editing.) – cHao Jan 03 '14 at 20:52
  • okay cHao, I will work on that, thanks. – Sergelie Jan 03 '14 at 20:55

1 Answers1

1

The expression you are using is valid and seems to match the output as expected. That is, if the file is saved as UTF-8.

If the input is in different encoding, it will not work. Also, if the input indeed is UTF-8 but the source if is not, the regex would seem to fail as it would try to match different characters (the non UTF-8 encoding equivalents).

Mikulas Dite
  • 7,790
  • 9
  • 59
  • 99
  • Thanks. I am working on it. This is my source : € 69.00 Repas complet hyper <-- UTF-8 dbases (from html page)- Then from the source of the html UTF-8 page :
    69.00 Repas complet The Euro sign do not show up, I Wonder why (I think this is the issue). I am investigating.
    – Sergelie Jan 03 '14 at 21:32
  • @SergeSf yeah, it really works http://ideone.com/2gFaJn :) check at what encoding the input reaches your script AND in what encoding is your source file. – Mikulas Dite Jan 03 '14 at 21:36
  • Hey Mikulas thanks a lot. I solved my issue, was not related to encoding, It was my mistake, I was trying to make calculations on numbers extracted, and I forgot to eliminate the other currencies. Sorry for the troubble. Here is the part I corrected : { $tot1 = $matches[0]; $tot1 = preg_replace("/\\\$/", '', $tot1); $tot1 = preg_replace("/\€/", '', $tot1); $tot1 = preg_replace("/\¥/", '', $tot1); $tot1 = preg_replace("/\£/", '', $tot1); – Sergelie Jan 03 '14 at 23:37