1

(Yes, I know there are relevant regex questions that ask how to capture information between two characters. I tried, they didn't work for me. I also read the regex tutorials as deep as possible.)

I have this code that uses BeautifulSoup to scrap some information from a website in this form: Exchange rate: 1 USD = 60.50 INR

This string is stored in a variable called 'data'. I have to capture '60.50' from this string. I have this code for that:

data = _funct()
rate = re.search("?<=\=)(.*?)(?=\I" , data)
print rate

It doesn't work. Where am I going wrong?

Alex K.
  • 171,639
  • 30
  • 264
  • 288
learnerX
  • 1,022
  • 1
  • 18
  • 44

2 Answers2

2

You can use a simple regex like this:

(\w+\.\w+)

Working demo

As you can see the idea behind the regex is:

( ... ) Use parentheses to capture the content
\w+\.\w+  any alphanumeric followed by a dot plus more alphanumeric.

If you only want to capture digits you could use:

\d+\.\d+

enter image description here

If you take a look at the Code Generator for python you can get the code which is:

import re
p = re.compile(ur'(\w+\.\w+)')
test_str = u"Exchange rate: 1 USD = 60.50 INR"

re.search(p, test_str)
Federico Piazza
  • 30,085
  • 15
  • 87
  • 123
1

I believe your regex isn't working because you are missing an open parenthesis at the beginning and a close parenthesis at the end. Also, the backslash \ before I is not necessary (but it does work since \I isn't a metacharacter code or anything like that). So you could do the following:

(?<=\=)(.*?)(?=I)

Please see Regex 101 Demo here.

I think, however, as others have mentioned, there are better ways of going about this, namely to look for digits and a decimal point preceded by spaces. The is a difficulty in what was suggested, however, namely that the exchange rate could be missing a leading digit (it could lead with a decimal point), or the decimal point may not be present at all. With that in mind, I would suggest the following:

(?<=\=)(?:\s*)(\d+(?:\.\d*)?|\.\d+)

See Regex Demo here.

David Faber
  • 12,277
  • 2
  • 29
  • 40
  • Thanks. Another slight issue: I now have grabbed 60.50 and stored it in rate = found.group(). Now, I try to multiply this rate with a float value (1000.00) and it tells me: Error: can't multiply sequence by non-int of type 'float'. Any ideas? – learnerX Jan 29 '15 at 13:31
  • You want to convert found.group() to a float before trying to multiply - http://stackoverflow.com/questions/485789/why-do-i-get-typeerror-cant-multiply-sequence-by-non-int-of-type-float – David Faber Jan 29 '15 at 13:35
  • Thanks. Done! -> rate=float(found.group()) – learnerX Jan 29 '15 at 13:43