1

I'm extracting data from emails. I have pieces of text like this:

Eg. 1: some standard text.   Bugs Bunny bugs@gmail.com 0411111111 more standard text 
Eg. 2: some standard text.   Bugs The Bunny bugs@gmail.com 0411111111 more standard text
Eg. 3: some standard text.   Bugs Bunny bugs.bunny@gmail.com 0411111111 more standard text
Eg. 4: some standard text.   Bugs bugs.bunny@gmail.com +6141 111 111 more standard text

As you can see, there is a name, email and phone number that I want to extract. The email should be easy enough, and I'm sure I can work out the phone options but how could I get the name?

I know the logic is: get the text after some standard text. and before the the first non-space-separated string before the @, but how?

This is my starting point (?<=some standard text. )(.*?)(?=@)

This gives me a result with a group (?<=some standard text. )(.*?)(?:[\w-\.]+)@ so I think I'm on the right path.

I'm using php.

Warren
  • 1,984
  • 3
  • 29
  • 60
  • 1. What do you mean by `a full match`? 2. Then `some standard text.` is always the same and always end with a dot? – Evandro Coan Feb 22 '17 at 00:32
  • Here is a quick version/example I came up with: `(?<=some standard text. )(.*?) ([^\s]+@[^\s]+) (\+?\d+(?:\s\d+)*)` (https://regex101.com/r/Wjz66g/1). It's not perfect, but it does follow along the same lines as what you were doing and might work enough. – Jonathan Kuhn Feb 22 '17 at 00:33
  • @addons_zz - I've just educated myself on groups, so I'm going to edit the question slightly. – Warren Feb 22 '17 at 00:35
  • @JonathanKuhn - love it! Please post that as an answer and I will accept it. – Warren Feb 22 '17 at 00:36

2 Answers2

2

Here is a quick version/example I came up with:

(?<=some standard text. )(.*?) ([^\s]+@[^\s]+) (\+?\d+(?:\s\d+)*) 

regex101.com/r/Wjz66g/1

It's not perfect, but it does follow along the same lines as what you were doing and might work enough.

Jonathan Kuhn
  • 15,279
  • 3
  • 32
  • 43
0

I wrote this, you can test it on: https://regex101.com/r/A29hjE/8

(?x) # Here we are entering the the free space mode

# Here we assure the spaces are not matched by the `[\w ]+` group
(?:\.\s+)

# Here we are matching for the guys name, before its email address
([\w ]+(?:\w+))\s+

# Here we match the email
(\w[^\s]+@[^\s]+)\s+

# Here  we match the telephone number
(\+?[\d ]+)(?!\w)
Evandro Coan
  • 8,560
  • 11
  • 83
  • 144