1

I need a regex to obfuscate emails in a database dump file I have. I'd like to replace all domains with a set domain like @fake.com so I don't risk sending out emails to real people during development. The emails do have to be unique to match database constraints, so I only want to replace the domain and keep the usernames.

I current have this regex for finding emails

\b[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

How do I convert this search regex into a regex I can use in a find and replace operation in either Sublime Text or SED or Vim?

EDIT:

Just a note, I just realized I could replace all strings found by @[A-Z0-9.-]+\.[A-Z]{2,4}\b in this case, but academically I am still interested in how you could treat each section of the email regex as a token and replace the username / domain independently.

James McMahon
  • 48,506
  • 64
  • 207
  • 283
  • There is no difference between a search and a find-and-replace regex, is there? If you want to do the job properly you might want to have a look [here](http://www.regular-expressions.info/email.html) though. – Martin Ender Apr 17 '13 at 22:38
  • @m.buettner, isn't there though, don't I need to separate out the email address into tokens and replace a specific token so I am not replacing the entire email address? – James McMahon Apr 17 '13 at 22:40
  • You can search for only the domain (`@....`) and replace it - if you can make the assumption that `@` doesn't appear in other context. You can also use capturing group and backreference. – nhahtdh Apr 17 '13 at 22:43
  • @JamesMcMahon oh I see what you mean. my bad. – Martin Ender Apr 17 '13 at 23:48
  • fake.com is an owned domain name. – jdigaetano Apr 19 '23 at 11:42

2 Answers2

9

SublimeText

SublimeText uses Boost syntax, which supports quite a large subset of features in Perl regex. But for this task, you don't need all those advanced constructs.

Below are 2 possible approaches:

  1. If you can assume that @ doesn't appear in any other context (which is quite a fair assumption for normal text), then you can just search for the domain part @[A-Z0-9.-]+\.[A-Z]{2,4}\b and replace it.

  2. If you use capturing groups (pattern) and backreference in replacement string.

    Find what

    \b([A-Z0-9._%-]+)@[A-Z0-9.-]+\.[A-Z]{2,4}\b
    

    ([A-Z0-9._%-]+) is the first (and only) capturing group in the regex.

    Replace with

    $1@fake.com
    

    $1 refers to the text captured by the first capturing group.

Note that for both methods above, you need to turn off case-sensitivity (indicated as the 2nd button on the lower left corner), unless you specifically want to remove only emails written in ALL CAPS.

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
  • An alternative way of doing it would be to use *positive lookbehind* to avoid capturing the first half of the email and only target the domain. – plalx Apr 17 '13 at 22:56
  • @plalx: Look-behind may not always work, since the pattern is variable-width. – nhahtdh Apr 17 '13 at 22:57
  • good to know, I had implemented the same solution as you but was fooling around with lookbehind and could not made it work. I guess that explains why... I"ll read more about it ;) – plalx Apr 17 '13 at 23:05
  • fake.com is actually an owned domain name. Don't use that. – jdigaetano Apr 19 '23 at 11:41
1

You may use the following command for Vim:

:%s/\(\<[A-Za-z0-9._%-]\+@\)[A-Za-z0-9.-]\+\.[A-Za-z]\{2,4}\>/\1fake.com/g

Everything between \( and \) will become a group that will be replaced by an escaped number of the group (\1 in this case). I've also modified the regexp to match the small letters and to have Vim-compatible syntax.

Also you may turn off the case sensitivity by putting \c anywhere in your regexp like this:

:%s/\c\(\<[A-Z0-9._%-]\+@\)[A-Z0-9.-]\+\.[A-Z]\{2,4}\>/\1fake.com/g

Please also note that % in the beginning of the line asks Vim to do the replacement in a whole file and g at the end to do multiple replacements in the same line.

One more approach is using the zero-width matching (\@<=):

:%s/\c\(\<[A-Z0-9._%-]\+@\)\@<=[A-Z0-9.-]\+\.[A-Z]\{2,4}\>/fake.com/g
Aleksei Zyrianov
  • 2,294
  • 1
  • 24
  • 32