0

I am analyzing log files with various domain names. I want to exclude/ignore from the output report any domain that has the word "macys". Here is an example output:

l.macys.com        87516
www.google.com     3016
search.yahoo.com   584
www.bing.com       166
macys-L0135874392.htm   1

I would like to have and output file were I would not see any domains with the word "macys".

kaya3
  • 47,440
  • 4
  • 68
  • 97
cevallos.valtira
  • 191
  • 1
  • 1
  • 8

1 Answers1

0

This sounds like the perfect use case for a Cascading Filter

You would set this up with a RegexFilter:

Pipe pipe = new Pipe(incomingPipe, new Fields("UrlColumn"), 
     new RegexFilter(".*macys.*", true), Fields.All);

Tailor the regex to your matching use case. The one above would remove all tuples (rows) that contain the word "macys"

Engineiro
  • 1,146
  • 7
  • 10