-4

How can I remove all the 'www.' with awk in my output file.

e.g.: my output file has multiple sites like

abc.com 
www.def.com 
blabla.org 
www.zxc.net 

I would like to remove all the www. in my output file:

abc.com 
def.com 
blabla.org 
zxc.net 
James Brown
  • 36,089
  • 7
  • 43
  • 59
John Smith
  • 59
  • 1
  • 1
  • 5

2 Answers2

1

Probably better done in sed:

 sed -i 's/^www\.//g' outputFile

In awk:

 awk '{gsub(/^www\./,"",$0)}1' outputFile 
JNevill
  • 46,980
  • 4
  • 38
  • 63
  • 1
    Probably better to use `sed -i 's/^www\.//g' outputFile` to avoid cases like `thisisawww.net`. – Defenestrator Oct 20 '17 at 16:46
  • @Defenestrator Good idea. I've updated both `sed` and `awk` with the `^` – JNevill Oct 20 '17 at 16:50
  • 2
    get rid of the `g` from both commands. You can't remove from the front of a string more than once. You also don't need the `,$0` in the awk command as that's the default. – Ed Morton Oct 21 '17 at 02:54
0

This is probably what you're looking for:

$ cat file
abc.com
www.def.com
blabla.org
www.zxc.net
www.org
www.acl.lanl.gov

$ sed -E 's/^www\.(([^.]+(\.|$)){2,})/\1/' file
abc.com
def.com
blabla.org
zxc.net
www.org
acl.lanl.gov

The above uses a sed that has -E for ERE support, e.g. GNU or OSX sed. Note the need for a more comprehensive input file to test if a proposed solution really works or not.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185