Questions tagged [awk]

AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. AWK is used largely with Unix systems.

AWK is an interpreted programming language (AWK stands for Aho, Weinberger, and Kernighan) designed for text processing and typically used as data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.

Source: Wikipedia.

An awk program is a series of pattern-action pairs, written as:

condition { action }
condition { action }
...

where condition is typically an expression and action a series of one or more commands, separated by a semi-colon ; character. The input is split into records, and each record is split into fields (by default, records are separated by the newline character and fields by horizontal whitespace.) Per record, each condition is checked and, if true, the commands in the action block are executed. Within the action block, fields are accessed by a 1-based index – e.g. $2 for the second field. If the condition is missing, the action block will always be executed. If the condition is present but the action block is absent, the default action is print $0 which is to print the current line after any transformations. Since a non-zero number is equivalent to true, then awk '1' file instructs awk to perform the default action (print) for every line.

Awk can have an optional BEGIN and optional END, where the BEGIN action is invoked before reading any input, and END action is invoked after all input is read:

BEGIN     { action } 
condition { action }
condition { action }
...
END       { action }

Awk was originally developed by Alfred Aho, Brian Kernighan and Peter Weinberger in 1977 and updated in 1985. Since then, various versions and dialects of awk have emerged. The most common are :

awk - the most common and will be found on most Unix-like systems. It also has a well defined IEEE standard.
mawk - a fast AWK implementation which it's code base is based on a byte-code interpreter.
nawk - during the development of AWK, the developers released a new version (new awk) to avoid confusion but it is itself now very old and lacking functionality present in all POSIX awks.
gawk - Also known as GNU awk. The only version in which the developers attempted to add i18n support. Allowed users to write their own C shared libraries to extend it with their own "plug-ins". This version is the standard implementation for Linux.

When asking questions about data processing using awk, please include complete input and desired output.

Some frequently occurring themes:

Books:

The AWK Programming Language by Aho, Kernighan & Weinberger (archive.org link)
Effective AWK, 4th edition by Robbins (see The GNU AWK Users Guide below for latest online version)
Effective AWK, 3rd edition by Robbins
Sed & Awk, 2nd edition by Dougherty & Robbins
Sed & Awk Pocket Reference, 2nd Edition by Arnold Robbins
AWK Language Programming - free book
Awk One-Liners Explained
GNU AWK one-liners by Sundeep Agarwal (includes a chapter on regular expressions)

Resources:

Awk.Info (archive.org link)
The GNU Awk User's Guide
POSIX specification of awk
Idiomatic awk
The awk programming language tutorial site
Awk one-liners
Awk one-liners explained

Other StackExchange Resources:

Related tags:

gawk (GNU's version of awk)
nawk (A very old, pre-POSIX version also from AT&T)
mawk (A different interpreter written by Mike Brennan)
sed (A kindred tool often mentioned in the same breath)

32722 questions

votes

5 answers

Can you colorize specific lines that are grepped from a file?

I run a weekly CRONTAB that collects hardware info from 40+ remote servers and creates a weekly log file on our report server at the home office. I have a script that I run against this weekly file to output only specific status lines to my…

bash perl awk sed

asked Nov 27 '17 at 20:54

PCnetMD

votes

3 answers

pattern match and replace the string with if else loop

I have a file containing multiple lines starting with "1ECLI H--- 12.345 .....". I want to remove a space between I and H and add R/S/T upon iteration of the H pattern. for eg. H810 if repeated in consecutive three lines, it should get added with a…

python r awk sed

asked Nov 06 '17 at 08:17

amruta

votes

2 answers

List Gitlab runners by name only

I am trying to get a list of only the names of gitlab runners. So the output of gitlab-runner list 2>&1 is: Listing configured runners ConfigFile=/etc/gitlab-runner/config.toml default_runner …

bash awk gitlab-ci-runner

asked Oct 25 '17 at 14:58

Rickkwa

2,197
4
23
34

votes

2 answers

String conversion from numeric to string in awk

I have below awk command which give mentioned result as well. echo '1600000.00000000' '1600000.0000' | awk '{ print ($1 != $2) ? "true" : "false" }' Result is : - false As per numeric value the result given by the command is correct. But I want…

bash unix awk

asked Oct 23 '17 at 09:17

Amol Murkute

votes

8 answers

Take every nth row from a file with groups and n is a given in a column

I have seen here and here on how to return every nth row; but my problem is different. A separate column in the file provides specifics about which nth element to return; which are different depending on the group. Here is a sample of the dataset…

python r bash shell awk

asked Oct 17 '17 at 07:29

deepseefan

3,701
3
18
31

votes

2 answers

awk match multiple pattern in column

What is the proper awk syntax to match multiple patterns in one column? Having a columnar file like this: c11 c21 c31 c12 c22 c32 c13 c23 c33 how to exclude lines that match c21 and c22 in the second column. With grep, one can do something like…

awk

asked Sep 10 '17 at 21:46

PedroA

1,803
4
27
50

votes

3 answers

Grab an IdentityFile from an ssh config based on a variable hostname via shell script

I'm writing a shell script where I need to obtain an IdentityFile from an ssh config file. The ssh config file looks like this: Host AAAA User aaaa IdentityFile /home/aaaa/.ssh/aaaaFILE IdentitiesOnly yes PubkeyAuthentication=yes …

shell parsing awk ssh ssh-config

asked Aug 14 '17 at 20:42

Wimateeka

2,474
2
16
32

votes

3 answers

How to get the group by count in unix

I have a list of records as following Item1,200 Item1,200 Item3,900 Item2,500 Item2,800 Item1,600 Item4, Item5, Item4,100 Item5, Item5,444 My output should be "Please check the file as Item1 is greater than 2" With my awk command the output is…

bash shell unix awk

asked Aug 14 '17 at 08:51

Bobby

votes

2 answers

awk gsub using variable for pattern matching

gsub(pattern, replacement, target): allows a variable to be used for pattern, but does not let me do regular expression. gsub(/pattern/, replacement, target): lets me do regular expression, but I cannot use a variable for the pattern. Is there a way…

linux awk

asked Aug 11 '17 at 20:11

ddjen11

votes

3 answers

Using awk to split line with multiple string delimiters

I have a file called pet_owners.txt that looks like: petOwner:Jane,petName:Fluffy,petType:cat petOwner:John,petName:Oreo,petType:dog ... petOwner:Jake,petName:Lucky,petType:dog I'd like to use awk to split the file using the delimiters: 'petOwner',…

bash awk

asked Aug 07 '17 at 17:49

Brinley

votes

1 answer

awk: printing a tab-delimited instead of a space-delimited file

I am using this awk command to print out certain columns of a file awk '{print $1,$2,$3,$5,log($11),$12}' inputfile > outputfile The inputfile is tab-delimited, and the outputfile is space-delimited. I need the outputfile to be tab-delimited as…

awk

asked Jul 28 '17 at 09:35

Abdel

5,826
12
56
77

votes

5 answers

How can I use awk or Perl to increment a number in a large XML file?

I have an XML file with the following line: I would like to increment this value by .04 and keep the format of the XML in place. I know this is possible with a Perl or awk script, but…

perl awk

asked Jan 15 '09 at 20:15

DC.

votes

3 answers

Pick up lines from a file based on line numbers in another file

I have two files - one contains the addresses (line numbers) and the other one data, like this: address file: 2 4 6 7 1 3 5 data file 1.000451451 2.000589214 3.117892278 4.479511994 5.484514874 6.784499874 7.021239396 I want to randomize the data…

python bash awk

asked Jun 09 '17 at 04:05

hassan

votes

7 answers

How to delete all the lines after the last occurence of pattern?

i want to delete all the lines after the last occurence of pattern except the pattern itself file.txt honor apple redmi nokia apple samsung lg htc file.txt what i want honor apple redmi nokia apple what i have tried sed -i '/apple/q'…

bash awk sed

asked Jun 01 '17 at 13:15

j.doe

votes

4 answers

Duplicate a CSV column in Bash

I have an issue where a client needs to duplicate a column in a CSV file. The values are always going to be identical and unfortunately our API doesn't allow for duplicate columns to be specified in the JSON. For example I have the following column…

bash csv awk

asked May 25 '17 at 09:39

Predator

Prev 1 2 3

…

99 100 Next