Questions tagged [awk]

AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. AWK is used largely with Unix systems.

AWK is an interpreted programming language (AWK stands for Aho, Weinberger, and Kernighan) designed for text processing and typically used as data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.

Source: Wikipedia.

An awk program is a series of pattern-action pairs, written as:

condition { action }
condition { action }
...

where condition is typically an expression and action a series of one or more commands, separated by a semi-colon ; character. The input is split into records, and each record is split into fields (by default, records are separated by the newline character and fields by horizontal whitespace.) Per record, each condition is checked and, if true, the commands in the action block are executed. Within the action block, fields are accessed by a 1-based index – e.g. $2 for the second field. If the condition is missing, the action block will always be executed. If the condition is present but the action block is absent, the default action is print $0 which is to print the current line after any transformations. Since a non-zero number is equivalent to true, then awk '1' file instructs awk to perform the default action (print) for every line.

Awk can have an optional BEGIN and optional END, where the BEGIN action is invoked before reading any input, and END action is invoked after all input is read:

BEGIN     { action } 
condition { action }
condition { action }
...
END       { action }

Awk was originally developed by Alfred Aho, Brian Kernighan and Peter Weinberger in 1977 and updated in 1985. Since then, various versions and dialects of awk have emerged. The most common are :

awk - the most common and will be found on most Unix-like systems. It also has a well defined IEEE standard.
mawk - a fast AWK implementation which it's code base is based on a byte-code interpreter.
nawk - during the development of AWK, the developers released a new version (new awk) to avoid confusion but it is itself now very old and lacking functionality present in all POSIX awks.
gawk - Also known as GNU awk. The only version in which the developers attempted to add i18n support. Allowed users to write their own C shared libraries to extend it with their own "plug-ins". This version is the standard implementation for Linux.

When asking questions about data processing using awk, please include complete input and desired output.

Some frequently occurring themes:

Books:

The AWK Programming Language by Aho, Kernighan & Weinberger (archive.org link)
Effective AWK, 4th edition by Robbins (see The GNU AWK Users Guide below for latest online version)
Effective AWK, 3rd edition by Robbins
Sed & Awk, 2nd edition by Dougherty & Robbins
Sed & Awk Pocket Reference, 2nd Edition by Arnold Robbins
AWK Language Programming - free book
Awk One-Liners Explained
GNU AWK one-liners by Sundeep Agarwal (includes a chapter on regular expressions)

Resources:

Awk.Info (archive.org link)
The GNU Awk User's Guide
POSIX specification of awk
Idiomatic awk
The awk programming language tutorial site
Awk one-liners
Awk one-liners explained

Other StackExchange Resources:

Related tags:

gawk (GNU's version of awk)
nawk (A very old, pre-POSIX version also from AT&T)
mawk (A different interpreter written by Mike Brennan)
sed (A kindred tool often mentioned in the same breath)

32722 questions

votes

2 answers

Remove quotes in awk command

I have a text file that needs to be processed using…

regex awk

asked Oct 20 '13 at 07:13

shantanuo

31,689
78
245
403

votes

7 answers

Finding gaps in sequential numbers

I don’t do this stuff for a living so forgive me if it’s a simple question (or more complicated than I think). I‘ve been digging through the archives and found a lot of tips that are close but being a novice I’m not sure how to tweak for my needs…

bash awk

asked Apr 07 '13 at 20:42

Shaun

votes

3 answers

Integer division in awk

I want to divide two numbers in awk, using integer division, i.e truncating the result. For example k = 3 / 2 print k should print 1 According to the manual, Division; because all numbers in awk are floating-point numbers, the result is not…

awk integer division

asked Feb 13 '13 at 16:27

user000001

32,226
12
81
108

votes

5 answers

Removing Windows newlines on Linux (sed vs. awk)

Have some delimited files with improperly placed newline characters in the middle of fields (not line ends), appearing as ^M in Vim. They originate from freebcp (on Centos 6) exports of a MSSQL database. Dumping the data in hex shows \r\n…

linux sed awk

asked Jul 27 '12 at 02:51

kermatt

1,585
2
16
36

votes

3 answers

Escaping separator within double quotes, in awk

I am using awk to parse my data with "," as separator as the input is a csv file. However, there are "," within the data which is escaped by double quotes ("..."). Example filed1,filed2,field3,"field4,FOO,BAR",field5 How can i ignore the comma ","…

awk delimiter double-quotes separator

asked Oct 18 '11 at 08:52

joomanji

votes

7 answers

How to grep the last occurrence of a line pattern

I have a file with contents x a x b x c I want to grep the last occurrence, x c when I try sed -n "/x/,/b/p" file it lists all the lines, beginning x to c.

bash shell awk sed grep

asked Jun 03 '14 at 11:37

user3702858

votes

6 answers

Better way of getting a GIT commit message by short hash?

I am currently getting my commit message for a certain commit hash by using this below: hash='b55da97' git log --pretty=oneline ${hash} | grep "${hash}" | awk '{ print $2 }' These seems extremely inefficient though. Is there a smarter or cheaper…

git bash awk grep git-log

asked Nov 05 '13 at 20:33

ehime

8,025
14
51
110

votes

11 answers

remove ^M characters from file using sed

I have this line inside a file: ULNET-PA,client_sgcib,broker_keplersecurities ,KEPLER I try to get rid of that ^M (carriage return) character so I used: sed 's/^M//g' However this does remove everything after ^M: [root@localhost tmp]# vi…

sed awk

asked Oct 16 '13 at 14:40

SoSed

votes

3 answers

What is the easiest way to remove 1st and last line from file with awk?

I am learning awk/gawk. So recently I just try to solve any problem with it to gain more practice opportunities. My coworker asked a question yesterday, "how to remove first and last line from file" . I know that sed '1d;$d' file would work. …

awk

asked Apr 06 '13 at 22:26

Imagination

votes

1 answer

How to add a character at the end of each line with awk?

I would like to add character A at the end of each line in a text file. How can I do this with awk? 1AAB VBNM JHTF 2SDA Desired output 1AABA VBNMA JHTFA 2SDAA

awk

asked Sep 17 '12 at 07:52

user1676953

votes

5 answers

'grep +A': print everything after a match

I have a file that contains a list of URLs. It looks like below: file1: http://www.google.com http://www.bing.com http://www.yahoo.com http://www.baidu.com http://www.yandex.com .... I want to get all the records after: http://www.yahoo.com,…

bash sed awk grep

asked Aug 10 '13 at 21:31

B.Mr.W.

18,910
35
114
178

votes

7 answers

Parsing variables from config file in Bash

Having the following content in a file: VARIABLE1="Value1" VARIABLE2="Value2" VARIABLE3="Value3" I need a script that outputs the following: Content of VARIABLE1 is Value1 Content of VARIABLE2 is Value2 Content of VARIABLE3 is Value3 Any ideas?

bash parsing variables awk

asked May 15 '13 at 17:48

KillDash9

votes

9 answers

mysqldump with db in a separate file

I'm writing a single line command that backups all databases into their respective names instead using of dumping all in one sql. Eg: db1 get saved to db1.sql and db2 gets saved to db2.sql So far, I'd gathered the following commands to retrieve…

bash awk backup mysql

asked Jun 03 '12 at 02:46

resting

16,287
16
59
90

votes

4 answers

Sort logs by date field in bash

let's have 126 Mar 8 07:45:09 nod1 /sbin/ccccilio[12712]: INFO: sadasdasdas 2 Mar 9 08:16:22 nod1 /sbin/zzzzo[12712]: sadsdasdas 1 Mar 8 17:20:01 nod1 /usr/sbin/cron[1826]: asdasdas 4 Mar 9 06:24:01 nod1 /USR/SBIN/CRON[27199]: aaaasdsd …

linux bash sorting scripting awk

asked Mar 09 '11 at 08:14

Mejmo

2,363
9
35
54

votes

12 answers

Can awk deal with CSV file that contains comma inside a quoted field?

I am using awk to perform counting the sum of one column in the csv file. The data format is something like: id, name, value 1, foo, 17 2, bar, 76 3, "I am the, question", 99 I was using this awk script to count the sum: awk -F, '{sum+=$3} END…

csv awk field text-parsing quoting

asked Jun 29 '10 at 06:35

maguschen

Prev 1 2 3

…

99 100 Next