Questions tagged [awk]

AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. AWK is used largely with Unix systems.

AWK is an interpreted programming language (AWK stands for Aho, Weinberger, and Kernighan) designed for text processing and typically used as data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.

Source: Wikipedia.

An awk program is a series of pattern-action pairs, written as:

condition { action }
condition { action }
...

where condition is typically an expression and action a series of one or more commands, separated by a semi-colon ; character. The input is split into records, and each record is split into fields (by default, records are separated by the newline character and fields by horizontal whitespace.) Per record, each condition is checked and, if true, the commands in the action block are executed. Within the action block, fields are accessed by a 1-based index – e.g. $2 for the second field. If the condition is missing, the action block will always be executed. If the condition is present but the action block is absent, the default action is print $0 which is to print the current line after any transformations. Since a non-zero number is equivalent to true, then awk '1' file instructs awk to perform the default action (print) for every line.

Awk can have an optional BEGIN and optional END, where the BEGIN action is invoked before reading any input, and END action is invoked after all input is read:

BEGIN     { action } 
condition { action }
condition { action }
...
END       { action }

Awk was originally developed by Alfred Aho, Brian Kernighan and Peter Weinberger in 1977 and updated in 1985. Since then, various versions and dialects of awk have emerged. The most common are :

awk - the most common and will be found on most Unix-like systems. It also has a well defined IEEE standard.
mawk - a fast AWK implementation which it's code base is based on a byte-code interpreter.
nawk - during the development of AWK, the developers released a new version (new awk) to avoid confusion but it is itself now very old and lacking functionality present in all POSIX awks.
gawk - Also known as GNU awk. The only version in which the developers attempted to add i18n support. Allowed users to write their own C shared libraries to extend it with their own "plug-ins". This version is the standard implementation for Linux.

When asking questions about data processing using awk, please include complete input and desired output.

Some frequently occurring themes:

Books:

The AWK Programming Language by Aho, Kernighan & Weinberger (archive.org link)
Effective AWK, 4th edition by Robbins (see The GNU AWK Users Guide below for latest online version)
Effective AWK, 3rd edition by Robbins
Sed & Awk, 2nd edition by Dougherty & Robbins
Sed & Awk Pocket Reference, 2nd Edition by Arnold Robbins
AWK Language Programming - free book
Awk One-Liners Explained
GNU AWK one-liners by Sundeep Agarwal (includes a chapter on regular expressions)

Resources:

Awk.Info (archive.org link)
The GNU Awk User's Guide
POSIX specification of awk
Idiomatic awk
The awk programming language tutorial site
Awk one-liners
Awk one-liners explained

Other StackExchange Resources:

Related tags:

gawk (GNU's version of awk)
nawk (A very old, pre-POSIX version also from AT&T)
mawk (A different interpreter written by Mike Brennan)
sed (A kindred tool often mentioned in the same breath)

32722 questions

votes

2 answers

Matching last occurrence before a previous match

How do I match the last occurrence of foo before the match of some number? foo: A 1 2 foo: B 1 foo: C 2 A search for pattern 2 should return: foo: A foo: C

regex bash awk grep

asked Feb 05 '14 at 19:31

Christopher Markieta

5,674
10
43
60

votes

2 answers

Using the first field in AWK as file name

The dataset is one big file with three columns: An ID of a section, something irrelevant and a line of text. An example could look like the following: A01 001 This is a simple test. A01 002 Just for exemplary purpose. A01 003 A02 001 This is another…

bash awk corpus

asked Feb 04 '14 at 14:48

beyeran

votes

2 answers

How to awk or grep a variable, without using echo?

I'm working with large variables and to speedup my script I'd like to awk or grep a variable without using echo/printf what I've tried: awk "/test/ {for(i=1; i<=100; i++) {getline; print}}" "$var" awk: fatal: cannot open file `<<$var content>>' for…

performance bash awk grep

asked Feb 02 '14 at 14:44

Orlo

votes

2 answers

Is there way to delete duplicate header in a file in Unix?

How can I delete multiple headers from a file? I tried to use the below code after finding it from How can I delete duplicate lines in a file in Unix?. awk '!x[$0]++' file.txt It is deleting all the duplicate records in the file. But in my case, I…

linux csv sed awk duplicates

asked Jan 30 '14 at 17:24

Dhruuv

votes

3 answers

Move column to last in awk

I would like to move a specified column (the 2nd) to the last column position. I have multiple large tab-delimited files containing variable numbers of columns and rows. But, column 2 in all needs to be last. Another way to put it is that I want…

bash awk cut

asked Jan 20 '14 at 15:35

user3212388

votes

2 answers

Convert massive MySQL dump file to CSV

I tryed something like this awk -F " " '{if($1=="INSERT"){print $5}}' input.sql | \ sed -e "s/^(//g" -e "s/),(/\n/g" -e "s/['\"]//g" \ -e "s/);$//g" -e "s/,/;/g" > output.txt But I find it slow and unoptimized A MySQL dump file looks…

mysql bash sed awk

asked Dec 20 '13 at 09:44

Syffys

votes

4 answers

Bash: how to find and break up long lines by inserting continuation character and newline?

I know how to find long lines in a file, using awk or sed: $ awk 'length<=5' foo.txt will print only lines of length <= 5. sed -i '/^.\{5,\}$/d' FILE would delete all lines with more than 5 characters. But how to find long lines and then break…

bash sed awk

asked Dec 14 '13 at 17:28

mort

12,988
14
52
97

votes

6 answers

Replace line after match

Given this file $ cat foo.txt AAA 111 BBB 222 CCC 333 I would like to replace the first line after BBB with 999. I came up with this command awk '/BBB/ {f=1; print; next} f {$1=999; f=0} 1' foo.txt but I am curious to any shorter commands with…

bash sed awk

asked Dec 09 '13 at 06:46

Zombo

votes

4 answers

How to count the number of instances of entries in column 1 and print the value to a new column

I have a tab delimited file that looks like the following: cluster.1 Adult.1 cluster.2 Comp.1 cluster.3 Adult.2 cluster.3 Pre.3 cluster.4 Pre.1 cluster.4 Juv.2 cluster.4 Comp.4 cluster.4 Adult.3 cluster.5 Adult.2 cluster.6 …

bash awk

asked Dec 06 '13 at 08:53

acalcino

votes

3 answers

Search replace string in a file based on column in other file

If we have the first file like below: (a.txt) 1 asm 2 assert 3 bio 4 Bootasm 5 bootmain 6 buf 7 cat 8 console 9 defs 10 echo and the second like: (b.txt) bio cat BIO bootasm bio defs cat Bio console bio BiO bIo assert …

bash shell sed awk

asked Dec 02 '13 at 11:26

user3057111

votes

5 answers

How to output counts for list of active/inactive inputs?

I have this input file (1=active, 0=inactive) a 1 a 0 b 1 b 1 b 0 c 0 c 0 c 0 c 0 . . . And want output like this: X repeats active count inactive count a 2 times …

arrays linux if-statement awk

asked Nov 27 '13 at 07:57

mahmoud

votes

2 answers

Unix: Count occurrences of similar entries in first column, sum the second column

I have a file with two columns of data, I would like to count the occurrence of similarities in the first column. When two similar entries in the first column are matched, I would like to also sum the value of the second column of the two matched…

bash awk

asked Nov 14 '13 at 18:37

Joseph Ivan Hayhoe

votes

3 answers

Getting average per line

I have a large data set in this format HF TLLL A T 0.999 NA 0.666 NA 0.566 NA NA 0.87 HF TLLM A T 0.500 0.500 0.666 0.566 NA NA 0.87 I want to calculate an average for each line, starting from column 5 until end of line, and ignoring the string NA.…

awk average

asked Nov 14 '13 at 17:12

user1308144

votes

3 answers

how to print tail of path filename using awk

I've searched it with no success. I have a file with pathes. I want to print the tail of a all pathes. for example (for every line in file): /homes/work/abc.txt --> abc.txt Does anyone know how to do it? Thanks

linux awk

asked Nov 10 '13 at 14:17

Noam Mizrachi

votes

8 answers

use grep to extract multiple values from one line

file: timestamp1 KKIE ABC=123 [5454] GHI=547 JKL=877 MNO=878 timestamp2 GGHI ABC=544 [ 24548] GHI=883 JKL=587 MNO=874 timestamp3 GGGIO ABC=877 [3487] GHI=77422 JKL=877 MNO=877 timestamp4 GGDI ABC=269 [ 1896] GHI=887 JKL=877…

regex bash shell awk

asked Nov 01 '13 at 17:39

blue_xylo

Prev 1 2 3

…

99 100 Next