Questions tagged [awk]

AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. AWK is used largely with Unix systems.

AWK is an interpreted programming language (AWK stands for Aho, Weinberger, and Kernighan) designed for text processing and typically used as data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.

Source: Wikipedia.

An awk program is a series of pattern-action pairs, written as:

condition { action }
condition { action }
...

where condition is typically an expression and action a series of one or more commands, separated by a semi-colon ; character. The input is split into records, and each record is split into fields (by default, records are separated by the newline character and fields by horizontal whitespace.) Per record, each condition is checked and, if true, the commands in the action block are executed. Within the action block, fields are accessed by a 1-based index – e.g. $2 for the second field. If the condition is missing, the action block will always be executed. If the condition is present but the action block is absent, the default action is print $0 which is to print the current line after any transformations. Since a non-zero number is equivalent to true, then awk '1' file instructs awk to perform the default action (print) for every line.

Awk can have an optional BEGIN and optional END, where the BEGIN action is invoked before reading any input, and END action is invoked after all input is read:

BEGIN     { action } 
condition { action }
condition { action }
...
END       { action }

Awk was originally developed by Alfred Aho, Brian Kernighan and Peter Weinberger in 1977 and updated in 1985. Since then, various versions and dialects of awk have emerged. The most common are :

awk - the most common and will be found on most Unix-like systems. It also has a well defined IEEE standard.
mawk - a fast AWK implementation which it's code base is based on a byte-code interpreter.
nawk - during the development of AWK, the developers released a new version (new awk) to avoid confusion but it is itself now very old and lacking functionality present in all POSIX awks.
gawk - Also known as GNU awk. The only version in which the developers attempted to add i18n support. Allowed users to write their own C shared libraries to extend it with their own "plug-ins". This version is the standard implementation for Linux.

When asking questions about data processing using awk, please include complete input and desired output.

Some frequently occurring themes:

Books:

The AWK Programming Language by Aho, Kernighan & Weinberger (archive.org link)
Effective AWK, 4th edition by Robbins (see The GNU AWK Users Guide below for latest online version)
Effective AWK, 3rd edition by Robbins
Sed & Awk, 2nd edition by Dougherty & Robbins
Sed & Awk Pocket Reference, 2nd Edition by Arnold Robbins
AWK Language Programming - free book
Awk One-Liners Explained
GNU AWK one-liners by Sundeep Agarwal (includes a chapter on regular expressions)

Resources:

Awk.Info (archive.org link)
The GNU Awk User's Guide
POSIX specification of awk
Idiomatic awk
The awk programming language tutorial site
Awk one-liners
Awk one-liners explained

Other StackExchange Resources:

Related tags:

gawk (GNU's version of awk)
nawk (A very old, pre-POSIX version also from AT&T)
mawk (A different interpreter written by Mike Brennan)
sed (A kindred tool often mentioned in the same breath)

32722 questions

votes

7 answers

What's the most robust way to efficiently parse CSV using awk?

The intent of this question is to provide a canonical answer. Given a CSV as might be generated by Excel or other tools with embedded newlines and/or double quotes and/or commas in fields, and empty fields like: $ cat file.csv "rec1,…

csv awk

asked Jul 31 '17 at 16:02

Ed Morton

188,023
17
78
185

votes

7 answers

Parse a csv using awk and ignoring commas inside a field

I have a csv file where each row defines a room in a given building. Along with room, each row has a floor field. What I want to extract is all floors in all buildings. My file looks like this... "u_floor","u_room","name" 0,"00BDF","AIRPORT…

csv awk

asked Nov 17 '10 at 14:35

Chris

11,780
13
48
70

votes

8 answers

Convert milliseconds timestamp to date from unix command line

I know about date -d @ and awk '{print strftime("%c", )}' but what if I have milliseconds. Is there trivial way to do this without dropping the final three characters of the millisecond-timestamp (not…

unix date awk

asked Sep 11 '12 at 03:58

jonderry

23,013
32
104
171

votes

4 answers

How to merge two files using AWK?

File 1 has 5 fields A B C D E, with field A is an integer-valued File 2 has 3 fields A F G The number of rows in File 1 is much bigger than that of File 2 (20^6 to 5000) All the entries of A in File 1 appeared in field A in File 2 I like to…

linux bash unix awk

asked Mar 29 '11 at 03:48

Tony

2,889
8
41
45

votes

4 answers

How to make awk ignore the field delimiter inside double quotes?

I need to delete 2 columns in a comma seperated values file. Consider the following line in the csv file: "abc@xyz.com,www.example.com",field2,field3,field4 "def@xyz.com",field2,field3,field4 Now, the result I want at the…

bash shell awk

asked Apr 15 '15 at 05:17

Deepak K M

votes

3 answers

Reverse sort order of a multicolumn file in BASH

I've the following file: 1 2 3 1 4 5 1 6 7 2 3 5 5 2 1 and I want that the file be sorted for the second column but from the largest number (in this case 6) to the smallest. I've tried with sort +1 -2 file.dat but it sorts in ascending order…

linux bash unix sorting awk

asked Jan 02 '13 at 10:58

Valerio D. Ciotti

1,369
2
17
27

votes

5 answers

How to use awk variables in regular expressions?

I have a file called domain which contains some domains. For example: google.com facebook.com ... yahoo.com And I have another file called site which contains some sites URLs and numbers. For example: image.google.com 10 map.google.com …

regex awk

asked Jul 18 '12 at 04:09

Hancy

votes

2 answers

Forcing the order of output fields from cut command

I want to do something like this: cat abcd.txt | cut -f 2,1 and I want the order to be 2 and then 1 in the output. On the machine I am testing (FreeBSD 6), this is not happening (its printing in 1,2 order). Can you tell me how to do this? I know I…

unix shell awk freebsd gnu-coreutils

asked Jun 24 '09 at 08:51

Shreeni

3,222
7
27
39

votes

6 answers

How can I trim white space from a variable in awk?

Suppose $2 is my variable. I have tried going from awk -F\, '{print $2 ":"}' to awk -F\, '{print gsub(/[ \t]+$/, "", $2) ":"}' But it goes from printing something to printing nothing at all.

linux bash shell awk

asked Apr 03 '12 at 00:15

Hoa

19,858
28
78
107

votes

7 answers

how to write finding output to same file using awk command

awk '/^nameserver/ && !modif { printf("nameserver 127.0.0.1\n"); modif=1 } {print}' testfile.txt It is displaying output but I want to write the output to same file. In my example testfile.txt.

bash shell awk

asked Nov 05 '11 at 10:36

Venkat

4,259
4
23
20

votes

4 answers

grep a large list against a large file

I am currently trying to grep a large list of ids (~5000) against an even larger csv file (3.000.000 lines). I want all the csv lines, that contain an id from the id file. My naive approach was: cat the_ids.txt | while read line do cat huge.csv |…

linux shell unix awk grep

asked Oct 15 '13 at 12:12

leifg

8,668
13
53
79

votes

19 answers

How to print a file, excluding comments and blank lines, using grep/sed?

I'd like to print out a file containing a series of comments like: ErrorLog ${APACHE_LOG_DIR}/error.log # Possible values include: debug, info, notice, warn, error, crit, # alert, emerg. LogLevel warn CustomLog…

regex awk sed grep

asked Jun 30 '13 at 17:16

Joel G Mathew

7,561
15
54
86

votes

7 answers

Use Awk to extract substring

Given a hostname in format of aaa0.bbb.ccc, I want to extract the first substring before ., that is, aaa0 in this case. I use following awk script to do so, echo aaa0.bbb.ccc | awk '{if (match($0, /\./)) {print substr($0, 0, RSTART - 1)}}' While…

bash awk

asked Apr 16 '13 at 15:07

Richard

14,642
18
56
77

votes

3 answers

Printing only the first field in a string

I have a date as 12/12/2013 14:32 I want to convert it into only 12/12/2013. The string can be 1/1/2013 12:32 or 1/10/2013 23:41 I need only the date part.

unix sed awk field cut

asked Feb 22 '13 at 12:44

user2099444

votes

6 answers

Difference between two lists using Bash

Ok, I have two related lists on my linux box in text files: /tmp/oldList /tmp/newList I need to compare these lists to see what lines got added and what lines got removed. I then need to loop over these lines and perform actions on them based on…

bash sorting sed awk grep

asked Jun 22 '12 at 22:56

exvance

1,339
4
13
31

Prev 1 2 3

…

99 100 Next