How can I filter email addresses that belong in a particular domain using Perl?

Question

How can I scan through a file which contains email addresses that are separated by a new line character and get rid of those that belong to a certain domain, e.g. hacker@bad.com. I want to get rid of all email addresses that are @bad.com

score 8 · Accepted Answer · edited Dec 29 '09 at 08:22

8

Use grep instead of Perl

grep -v '@bad\.com' inputfile > outputfile

On Windows

findstr /v "@bad\.com" inputfile > outputfile

edited Dec 29 '09 at 08:22

brian d foy

129,424
31
207
592

answered Dec 28 '09 at 20:27

Jim Garrison

85,615
20
155
190

what about "me@bad.com.*"? Are also to be filtered out? – Leonardo Herrera Dec 29 '09 at 15:53

score 1 · Answer 2 · edited Jul 12 '17 at 15:41

Email::Address is a nice module for dealing with email addresses.

Here is an example which may whet you appetite:

use Email::Address;

my $data = 'this person email is hacker@bad.com
blah blah hacker@good.com blah blah
another@bad.com
';

my @emails      = Email::Address->parse( $data );
my @good_emails = grep { $_->host ne 'bad.com' } @emails;

say "@emails";       # => hacker@bad.com hacker@good.com another@bad.com
say "@good_emails";  # => hacker@good.com

score 0 · Answer 3 · answered Dec 28 '09 at 20:41

0

This should do:

$badDomain = "bad.com";
while(<>)
{
        s{\s+$}{};
        print "$_\n" if(!/\@$badDomain$/);
}

answered Dec 28 '09 at 20:41

codaddict

445,704
82
492
529

1

Since we never `chomp()`-ed the line, it will already have a newline at the end by default. You don't need to print it with another one (unless of course you want blank lines between your output lines). – Chris Lutz Dec 28 '09 at 21:21
@Chris: If you look closely at line 4, I'm removing all trailing whitespaces. That will remove the trailing \n as well. So a \n in the print is needed. – codaddict Dec 29 '09 at 02:46
Ah. In that case, why not `s/\s+$/\n/;` so the newline is kept, then just `print if /regex/` ? – Chris Lutz Dec 29 '09 at 09:46

score 0 · Answer 4 · answered Dec 29 '09 at 00:49

The following would allow you to have a script that you can enhance in time... Instead of simply filtering out @bad.com (which you can do with a simple grep), you can write your script so you can easily sophisticate which domains are unwanted.

my $bad_addresses = {'bad.com'=>1};

while (my $s = <>) {
    print $s unless (is_bad_address($s));
}

sub is_bad_address {
    my ($addr) = @_;
    if ($addr=~/^([^@]+)\@([^@\n\r]+)$/o) {
        my $domain = lc($2);
        return 0 unless (defined $bad_addresses->{$domain});
        return $bad_addresses->{$domain};
    }
    return 1;
}

score 0 · Answer 5 · answered Dec 29 '09 at 16:02

Not too different of what others have done.

use strict;
use warnings;

my @re = map { qr/@(.*\.)*\Q$_\E$/ } qw(bad.com mean.com);

while (my $line = <DATA>) {
    chomp $line;
    if (grep { $line =~ /$_/ } @re) {
        print "Rejected: $line\n";
    } else {
        print "Allowed: $line\n";
    }
}

__DATA__
good@good.com
bad@bad.com
notbad@bad.comm.com
alsobad@bad.com
othergood@good.com
not@mean.com
good@reallymean.com
bad@really.mean.com

ghostdog74 · Answer 6 · 2009-12-29T09:31:29.850

-1

Perl

perl -ne 'print if !/@bad\.com/' file

awk

awk '!/@bad\.com/' file

edited Dec 29 '09 at 09:31

answered Dec 28 '09 at 23:48

ghostdog74

327,991
56
259
343

That's not the right pattern. It also excludes notbad.com, etc. – brian d foy Dec 29 '09 at 08:20

score -3 · Answer 7 · edited Dec 29 '09 at 08:20

-3

this code should filter all the @bad.com address from the input files.

 my @array = <>;

 foreach(@array) {
   if(!/\@bad.com$/) {
     print $_;
   }
 }

edited Dec 29 '09 at 08:20

brian d foy

129,424
31
207
592

answered Dec 28 '09 at 21:04

dan

885
2
9
18

That's awful. Why would you slurp in `<>` when you could just iterate over it for the same effect, with almost no memory impact? – Chris Lutz Dec 28 '09 at 21:17

How can I filter email addresses that belong in a particular domain using Perl?

7 Answers7