0

I have a log with millions of lines that like this

1482364800 bunch of stuff 172.169.49.138 252377 + many other things
1482364808 bunch of stuff 128.169.49.111 131177 + many other things 
1482364810 bunch of stuff 2001:db8:0:0:0:0:2:1 124322 + many other things
1482364900 bunch of stuff 128.169.49.112 849231 + many other things
1482364940 bunch of stuff 128.169.49.218 623423 + many other things

Its so big that I can't really read it into memory for python to parse so i want to zgrep out only the items I need into another smaller file but Im not very good with grep. In python I would normally open.gzip(log.gz) then pull out data[0],data[4],data[5]to a new file so my new file only has the epoc and ip and date(the ip can be ipv6 or 4)

expected result of the new file:

1482364800 172.169.49.138 252377
1482364808 128.169.49.111 131177  
1482364810 2001:db8:0:0:0:0:2:1 124322 
1482364900 128.169.49.112 849231 
1482364940 128.169.49.218 623423 

How do I do this zgrep?

Thanks

chowpay
  • 1,515
  • 6
  • 22
  • 44

2 Answers2

1

To select columns you have to use cut command zgrep/grep select lines so you can use cut command like this

cut -d' ' -f1,2,4

in this exemple I get the columns 1 2 and 4 with space ' ' as a delimiter of the columns yous should know that -f option is used to specify numbers of columns and -d for the delimiter.

I hope that I have answered your question

Mohamed Amine Ouali
  • 575
  • 1
  • 8
  • 23
  • hi I've been trying run it but it seems to be hanging perhaps its my syntax: `zgrep logFile.gz | cut -d' ' -f1,3,4 > file.txt` am I doing something incorrectly? – chowpay Jan 29 '17 at 23:09
  • I think that zgrep should have another parameter to much the line you want so don't uset if you don't want to filter throw line. """" this should work: zcat logFile.gz|cut -d' ' -f1,3,4 """"" zcat work like cat but it's for gz file>file.txt – Mohamed Amine Ouali Jan 29 '17 at 23:41
  • Here is the command I ran : `zcat logFile.gz|cut -d' ' -f1,3,4 >> file.txt` I just get this error `can't stat: logFile.gz (logFile.gz.Z): No such file or directory` – chowpay Jan 30 '17 at 04:25
  • try using `gunzip -c` or `gzcat` instend of `zcat` `gunzip -c logFile.gz|cut -d' ' -f1,3,4 >> file.txt` or `gzcat logFile.gz|cut -d' ' -f1,3,4 >> file.txt` – Mohamed Amine Ouali Jan 30 '17 at 14:40
  • I think that the problem is common for OSX user you should look at this link http://serverfault.com/questions/570024/zcat-gzcat-works-in-linux-not-on-osx-general-linux-osx-compatibility – Mohamed Amine Ouali Jan 31 '17 at 17:03
  • 1
    this is another link you should have a look http://magnatecha.com/zcat-adds-z-in-mac-os/ – Mohamed Amine Ouali Jan 31 '17 at 17:06
0

I'm on OSX and maybe that is the issue but I couldnt get zgrep to work in filtering out columns. and zcat kept added a .Z at the end of the .gz. Here's what I ended up doing:

awk '{print $1,$3,$4}' <(gzip -dc /path/to/source/Largefile.log.gz) | gzip > /path/to/output/Smallfile.log.gz

This let me filter out the 3 columns I needed from the Largefile to a Smallfile while keeping both the source and destination in compressed format.

chowpay
  • 1,515
  • 6
  • 22
  • 44