Questions tagged [awk]

AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. AWK is used largely with Unix systems.

AWK is an interpreted programming language (AWK stands for Aho, Weinberger, and Kernighan) designed for text processing and typically used as data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.

Source: Wikipedia.

An awk program is a series of pattern-action pairs, written as:

condition { action }
condition { action }
...

where condition is typically an expression and action a series of one or more commands, separated by a semi-colon ; character. The input is split into records, and each record is split into fields (by default, records are separated by the newline character and fields by horizontal whitespace.) Per record, each condition is checked and, if true, the commands in the action block are executed. Within the action block, fields are accessed by a 1-based index – e.g. $2 for the second field. If the condition is missing, the action block will always be executed. If the condition is present but the action block is absent, the default action is print $0 which is to print the current line after any transformations. Since a non-zero number is equivalent to true, then awk '1' file instructs awk to perform the default action (print) for every line.

Awk can have an optional BEGIN and optional END, where the BEGIN action is invoked before reading any input, and END action is invoked after all input is read:

BEGIN     { action } 
condition { action }
condition { action }
...
END       { action }

Awk was originally developed by Alfred Aho, Brian Kernighan and Peter Weinberger in 1977 and updated in 1985. Since then, various versions and dialects of awk have emerged. The most common are :

  • awk - the most common and will be found on most Unix-like systems. It also has a well defined IEEE standard.
  • mawk - a fast AWK implementation which it's code base is based on a byte-code interpreter.
  • nawk - during the development of AWK, the developers released a new version (new awk) to avoid confusion but it is itself now very old and lacking functionality present in all POSIX awks.
  • gawk - Also known as GNU awk. The only version in which the developers attempted to add i18n support. Allowed users to write their own C shared libraries to extend it with their own "plug-ins". This version is the standard implementation for Linux.

When asking questions about data processing using awk, please include complete input and desired output.

Some frequently occurring themes:

Books:

Resources:

Other StackExchange Resources:

Related tags:

  • (GNU's version of awk)
  • (A very old, pre-POSIX version also from AT&T)
  • (A different interpreter written by Mike Brennan)
  • (A kindred tool often mentioned in the same breath)
32722 questions
53
votes
11 answers

Randomly Pick Lines From a File Without Slurping It With Unix

I have a 10^7 lines file, in which I want to choose 1/100 of lines randomly from the file. This is the AWK code I have, but it slurps all the file content before hand. My PC memory cannot handle such slurps. Is there other approach to do it? awk…
neversaint
  • 60,904
  • 137
  • 310
  • 477
53
votes
2 answers

How to Sum a column in AWK?

My file is delimited by a comma which gives 64 columns. I extracted the field as shown below: awk '{split($0,a,","); print a[57]}' How can I compute the sum of the values in columns 57 with my command?
Ajo
  • 643
  • 1
  • 5
  • 4
53
votes
7 answers

Print rest of the fields in awk

Suppose we have this data file. john 32 maketing executive jack 41 chief technical officer jim 27 developer dela 33 assistant risk management officer I want to print using awk john maketing executive jack chief technical officer jim …
Shiplu Mokaddim
  • 56,364
  • 17
  • 141
  • 187
53
votes
4 answers

AWK to print field $2 first, then field $1

Here is the input(sample): name1@gmail.com|com.emailclient.account name2@msn.com|com.socialsite.auth.account I'm trying to achieve this: Emailclient name1@gmail.com Socialsite name2@msn.com If I use AWK like this: cat foo | awk 'BEGIN{FS="|"}…
Sazzy
  • 1,924
  • 3
  • 19
  • 27
53
votes
8 answers

delete a column with awk or sed

I have a file with three columns. I would like to delete the 3rd column(in-place editing). How can I do this with awk or sed? 123 abc 22.3 453 abg 56.7 1236 hjg 2.3 Desired output 123 abc 453 abg 1236 hjg
user2160995
  • 563
  • 1
  • 4
  • 5
53
votes
5 answers

How to find/replace and increment a matched number with sed/awk?

Straight to the point, I'm wondering how to use grep/find/sed/awk to match a certain string (that ends with a number) and increment that number by 1. The closest I've come is to concatenate a 1 to the end (which works well enough) because the main…
Ian
  • 50,146
  • 13
  • 101
  • 111
52
votes
6 answers

How to get second last field from a cut command

I have a set of data as input and need the second last field based on deleimiter. The lines may have different numbers of delimiter. How can I get second last field ? example input text,blah,blaah,foo this,is,another,text,line expected…
Archit Jain
  • 2,154
  • 1
  • 18
  • 32
51
votes
4 answers

Grep Regex: List all lines except

I'm trying to automagically remove all lines from a text file that contains a letter "T" that is not immediately followed by a "H". I've been using grep and sending the output to another file, but I can't come up with the magic regex that will…
Matt Parkins
  • 24,208
  • 8
  • 50
  • 59
50
votes
5 answers

Filter log file entries based on date range

My server is having unusually high CPU usage, and I can see Apache is using way too much memory. I have a feeling, I'm being DOS'd by a single IP - maybe you can help me find the attacker? I've used the following line, to find the 10 most "active"…
sqren
  • 22,833
  • 7
  • 52
  • 36
50
votes
4 answers

Parsing the first column of a csv file to a new file

Operating System: OSX Method: From the command line, so using sed, cut, gawk, although preferably no installing modules. Essentially I am trying to take the first column of a csv file and parse it to a new file. Example input…
S1syphus
  • 1,401
  • 5
  • 20
  • 29
50
votes
15 answers

Linux bash script to extract IP address

I want to make big script on my Debian 7.3 ( something like translated and much more new user friendly enviroment ). I have a problem. I want to use only some of the informations that commands give me. For example my ifconfig looks like: eth0 …
user3232381
  • 531
  • 1
  • 4
  • 6
49
votes
3 answers

How can I get awk to print without white space?

When I run the following awk -F\, '{print $2,":",$1}' It prints "First : Second" How can I get "First:Second"
Hoa
  • 19,858
  • 28
  • 78
  • 107
49
votes
8 answers

Extract specific columns from delimited file using Awk

Sorry if this is too basic. I have a csv file where the columns have a header row (v1, v2, etc.). I understand that to extract columns 1 and 2, I have to do: awk -F "," '{print $1 "," $2}' infile.csv > outfile.csv. But what if I have to extract,…
user702432
  • 11,898
  • 21
  • 55
  • 70
49
votes
6 answers

Remove odd or even lines from a text file

I need to remove odd lines in a text file to make a down-sampling. I've found this command, awk 'NR%2==0' file but it only prints the odd lines in the terminal. How to really remove them? I don't really care for even or odd, I want them removed…
SamuelNLP
  • 4,038
  • 9
  • 59
  • 102
49
votes
4 answers

Saving awk output to variable

Can anyone help me out with this problem? I'm trying to save the awk output into a variable. variable = `ps -ef | grep "port 10 -" | grep -v "grep port 10 -"| awk '{printf "%s", $12}'` printf "$variable" EDIT: $12 corresponds to a parameter running…
Jeremy
  • 791
  • 2
  • 7
  • 11