How can I scan through a file which contains email addresses that are separated by a new line character and get rid of those that belong to a certain domain, e.g. hacker@bad.com
. I want to get rid of all email addresses that are @bad.com
Asked
Active
Viewed 346 times
1

brian d foy
- 129,424
- 31
- 207
- 592

John
- 1,059
- 4
- 12
- 16
7 Answers
8
Use grep
instead of Perl
grep -v '@bad\.com' inputfile > outputfile
On Windows
findstr /v "@bad\.com" inputfile > outputfile

brian d foy
- 129,424
- 31
- 207
- 592

Jim Garrison
- 85,615
- 20
- 155
- 190
-
what about "me@bad.com.*"? Are also to be filtered out? – Leonardo Herrera Dec 29 '09 at 15:53
1
Email::Address
is a nice module for dealing with email addresses.
Here is an example which may whet you appetite:
use Email::Address;
my $data = 'this person email is hacker@bad.com
blah blah hacker@good.com blah blah
another@bad.com
';
my @emails = Email::Address->parse( $data );
my @good_emails = grep { $_->host ne 'bad.com' } @emails;
say "@emails"; # => hacker@bad.com hacker@good.com another@bad.com
say "@good_emails"; # => hacker@good.com
0
This should do:
$badDomain = "bad.com";
while(<>)
{
s{\s+$}{};
print "$_\n" if(!/\@$badDomain$/);
}

codaddict
- 445,704
- 82
- 492
- 529
-
1Since we never `chomp()`-ed the line, it will already have a newline at the end by default. You don't need to print it with another one (unless of course you want blank lines between your output lines). – Chris Lutz Dec 28 '09 at 21:21
-
@Chris: If you look closely at line 4, I'm removing all trailing whitespaces. That will remove the trailing \n as well. So a \n in the print is needed. – codaddict Dec 29 '09 at 02:46
-
Ah. In that case, why not `s/\s+$/\n/;` so the newline is kept, then just `print if /regex/` ? – Chris Lutz Dec 29 '09 at 09:46
0
The following would allow you to have a script that you can enhance in time... Instead of simply filtering out @bad.com (which you can do with a simple grep), you can write your script so you can easily sophisticate which domains are unwanted.
my $bad_addresses = {'bad.com'=>1};
while (my $s = <>) {
print $s unless (is_bad_address($s));
}
sub is_bad_address {
my ($addr) = @_;
if ($addr=~/^([^@]+)\@([^@\n\r]+)$/o) {
my $domain = lc($2);
return 0 unless (defined $bad_addresses->{$domain});
return $bad_addresses->{$domain};
}
return 1;
}

Zoran Simic
- 10,293
- 6
- 33
- 35
0
Not too different of what others have done.
use strict;
use warnings;
my @re = map { qr/@(.*\.)*\Q$_\E$/ } qw(bad.com mean.com);
while (my $line = <DATA>) {
chomp $line;
if (grep { $line =~ /$_/ } @re) {
print "Rejected: $line\n";
} else {
print "Allowed: $line\n";
}
}
__DATA__
good@good.com
bad@bad.com
notbad@bad.comm.com
alsobad@bad.com
othergood@good.com
not@mean.com
good@reallymean.com
bad@really.mean.com

Leonardo Herrera
- 8,388
- 5
- 36
- 66
-1
Perl
perl -ne 'print if !/@bad\.com/' file
awk
awk '!/@bad\.com/' file

ghostdog74
- 327,991
- 56
- 259
- 343
-3
this code should filter all the @bad.com address from the input files.
my @array = <>;
foreach(@array) {
if(!/\@bad.com$/) {
print $_;
}
}

brian d foy
- 129,424
- 31
- 207
- 592

dan
- 885
- 2
- 9
- 18
-
That's awful. Why would you slurp in `<>` when you could just iterate over it for the same effect, with almost no memory impact? – Chris Lutz Dec 28 '09 at 21:17