0

I have a 5MB text file where I need to find all email addresses and remove everything else.

Text file contains items like in the below snippet:

<snip>
To: (Address)
/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=49ee46a4d9da8492a8d0583f9b13225d5-Claire D
/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=a1525d3se9057487d9cacdec1562b7281-Big Tang;/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=92414e086e5540d890bg1372316f15222-Matt Perry
/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=3c776ca5d813948559a705db141bf0100-Vijay Boy;/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=49ee4s6a49da8492a8d0583f9b13225d5-Claire N
/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=6e799gd02635149138e4c9d152ab0357e-Becky G
/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=f65ed21e081g54effad7c9b4f0778f2b8-Ham Ly
/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=d875920114ga748e99f045dbac3e34372-Brad King
/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=8d945fcc838gb49af822e17b6a3f641b7-Bharat Mass
/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=8514631915374ef88g3b382f4b7d2d4b2-Pratboss;/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=a1525d3e9057487d9cacgdec1562b7281-Huy Tang;/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=8bc63496da41481fb02fbgcf359c029b1-Dolly Age
sales@trol.com
Joey.Boss@BCape.com
/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=cddab36g026d64df993ca28a445354c0a-Dilshad A.
Joey.Boss@BCape.com
/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=9843f7566d374cb7ac634637098gc3633-Orewell Dme;/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=2198f33e85a24ebab276g2ea14g2415216-Mind God;/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=7ea70e47dc7841a7ag007bfdba21feaf4-Prabhu Dist;/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=8d945fcc838b49afg822e17b6a3f641b7-Bharat Mass

</snip>

I was able to research how to find for email addresses in Editplus using Regular Expression. I just can't figure out the find / replace command to remove everything except the email addresses and to ensure the email addresses are separated by a line.

The below when put in Editplus find command, gives me the email:

[a-zA-Z0-9\.\-_]+@[a-zA-Z0-9\.\-_]+\.[a-zA-Z0-9\.\-_]+

I would appreciate the help to remove everything except the email addresses.

Michael Benjamin
  • 346,931
  • 104
  • 581
  • 701

2 Answers2

0

Description

([a-zA-Z0-9\.\-_]+@[a-zA-Z0-9\.\-_]+\.[a-zA-Z0-9\.\-_]+)|.

Replace With: $1

Regular expression visualization

I would wrap your expression into ( ..your expression.. )|. Then just replace everything, if the substring being replaced is an email address that matches your expression then it'll be captured into $1 and replaced into the return string.

Example

Live Demo

https://regex101.com/r/kY5dU8/1

Sample text

<snip>
To: (Address)
/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=49ee46a4d9da8492a8d0583f9b13225d5-Claire D
/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=a1525d3se9057487d9cacdec1562b7281-Big Tang;/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=92414e086e5540d890bg1372316f15222-Matt Perry
/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=3c776ca5d813948559a705db141bf0100-Vijay Boy;/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=49ee4s6a49da8492a8d0583f9b13225d5-Claire N
/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=6e799gd02635149138e4c9d152ab0357e-Becky G
/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=f65ed21e081g54effad7c9b4f0778f2b8-Ham Ly
/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=d875920114ga748e99f045dbac3e34372-Brad King
/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=8d945fcc838gb49af822e17b6a3f641b7-Bharat Mass
/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=8514631915374ef88g3b382f4b7d2d4b2-Pratboss;/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=a1525d3e9057487d9cacgdec1562b7281-Huy Tang;/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=8bc63496da41481fb02fbgcf359c029b1-Dolly Age
sales@trol.com
Joey.Boss@BCape.com
/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=cddab36g026d64df993ca28a445354c0a-Dilshad A.
Joey.Boss@BCape.com
/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=9843f7566d374cb7ac634637098gc3633-Orewell Dme;/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=2198f33e85a24ebab276g2ea14g2415216-Mind God;/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=7ea70e47dc7841a7ag007bfdba21feaf4-Prabhu Dist;/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLTM)/cn=Recipients/cn=8d945fcc838b49afg822e17b6a3f641b7-Bharat Mass

</snip>

After Replacement

sales@trol.com
Joey.Boss@BCape.com
Joey.Boss@BCape.com

Explanation

NODE                     EXPLANATION
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [a-zA-Z0-9\.\-_]+        any character of: 'a' to 'z', 'A' to
                             'Z', '0' to '9', '\.', '\-', '_' (1 or
                             more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    @                        '@'
----------------------------------------------------------------------
    [a-zA-Z0-9\.\-_]+        any character of: 'a' to 'z', 'A' to
                             'Z', '0' to '9', '\.', '\-', '_' (1 or
                             more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \.                       '.'
----------------------------------------------------------------------
    [a-zA-Z0-9\.\-_]+        any character of: 'a' to 'z', 'A' to
                             'Z', '0' to '9', '\.', '\-', '_' (1 or
                             more times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
 |                        OR
----------------------------------------------------------------------
  .                        any character except \n
----------------------------------------------------------------------
Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43
0

This really isn't complicated, especially if you break the task down.

According to your regex, an email address must have a @ sign. So I used a global replace using this regex (multiline/global modes enabled):

^[^@]+$

The result is:

sales@trol.com
Joey.Boss@BCape.com

Joey.Boss@BCape.com

Now, you just need to replace multiple spaces with a newline, which can be done with this regex:

\s+

For your data, I would suggest that you consider using a simple regex or two to maintain readability. A regex like this will give you the best performance by far (the other answer takes more than 10,000 steps to finish, compared to mine, which only needs 60... or 1,000% faster give or take).

Community
  • 1
  • 1
Laurel
  • 5,965
  • 14
  • 31
  • 57