0

My workplace has forms sent in via Asana and I am trying to extract the Recipient Email: entry and the Recipient Contact Number: from the desc variable. (note: there are other email & phone numbers I dont want to get in the larger result)

I'm trying to learn after years of PHP & js

The result I want is

i_want@this.com

but I'm getting

nah bro

Can anyone nudge me the right way?

import re

desc = '''Recipient Email:
i_want@this.com

Recipient Contact Number:
+44(0)0000 000000
'''

email = re.search('Recipient Email:(.*)Recipient Contact Number:', desc)
if email:
    found = email.group(1)
else:
    print('nah bro')
user3840170
  • 26,597
  • 4
  • 30
  • 62
Brad Sullivan
  • 97
  • 1
  • 1
  • 8
  • Is it always written like this? I mean why just not try to find email and phone with re itself – sudden_appearance Feb 03 '22 at 22:06
  • fair, I tried that & it worked. however the full result is bigger & contains other email addresses & phone numbers which I do not need. It's specifically these two, which are going to always be under those headings – Brad Sullivan Feb 03 '22 at 22:10

4 Answers4

1

You need re.DOTALL even when using .search()

re.DOTALL
[..] without this flag, '.' will match anything except a newline. [..]

>>> re.search('Recipient Email:(.*)Recipient Contact Number:', desc, re.DOTALL)
<re.Match object; span=(0, 59), match='Recipient Email:\ni_want@this.com\n\nRecipient Co
ti7
  • 16,375
  • 6
  • 40
  • 68
0

It looks like your regex just doesn't match what you're looking for... Try this:

email = re.search('Recipient Email:[\r\n]+([^\r\n]+)', desc)

Check out this post for an explanation of how this regex expression works.

Essentially, we are asking for the characters that come after the line 'Recipient Email:' up until (not including) the next newline character.

  • this is brilliant and also works great, it made it simple to repeat a search for the Contact Number on the next line. thanks so much for the tip and link to that post. i'm always learning!! – Brad Sullivan Feb 03 '22 at 22:24
0

The problem is that your string is containing enters \n and your regular expression is not. You could change your expression to

email = re.search('Recipient Email:\s*(.*)\s*Recipient Contact Number:', desc)

\s is equivalent to any of these characters \r\n\t\f\v

Take a look at Regex101 when making regular expressions.

Casper Kuethe
  • 1,070
  • 8
  • 13
0

You can use this regex [A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}

import re

desc = '''Recipient Email:
i_want@this.com

Recipient Contact Number:
+44(0)0000 000000
'''
regex = r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}"

email = re.search(regex, desc)

if email:
    found = email.group(0)
else:
    print('nah bro')

The output that i got

i_want@this.com
Nabil
  • 1,130
  • 5
  • 11