Questions tagged [awk]

AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. AWK is used largely with Unix systems.

AWK is an interpreted programming language (AWK stands for Aho, Weinberger, and Kernighan) designed for text processing and typically used as data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.

Source: Wikipedia.

An awk program is a series of pattern-action pairs, written as:

condition { action }
condition { action }
...

where condition is typically an expression and action a series of one or more commands, separated by a semi-colon ; character. The input is split into records, and each record is split into fields (by default, records are separated by the newline character and fields by horizontal whitespace.) Per record, each condition is checked and, if true, the commands in the action block are executed. Within the action block, fields are accessed by a 1-based index – e.g. $2 for the second field. If the condition is missing, the action block will always be executed. If the condition is present but the action block is absent, the default action is print $0 which is to print the current line after any transformations. Since a non-zero number is equivalent to true, then awk '1' file instructs awk to perform the default action (print) for every line.

Awk can have an optional BEGIN and optional END, where the BEGIN action is invoked before reading any input, and END action is invoked after all input is read:

BEGIN     { action } 
condition { action }
condition { action }
...
END       { action }

Awk was originally developed by Alfred Aho, Brian Kernighan and Peter Weinberger in 1977 and updated in 1985. Since then, various versions and dialects of awk have emerged. The most common are :

awk - the most common and will be found on most Unix-like systems. It also has a well defined IEEE standard.
mawk - a fast AWK implementation which it's code base is based on a byte-code interpreter.
nawk - during the development of AWK, the developers released a new version (new awk) to avoid confusion but it is itself now very old and lacking functionality present in all POSIX awks.
gawk - Also known as GNU awk. The only version in which the developers attempted to add i18n support. Allowed users to write their own C shared libraries to extend it with their own "plug-ins". This version is the standard implementation for Linux.

When asking questions about data processing using awk, please include complete input and desired output.

Some frequently occurring themes:

Books:

The AWK Programming Language by Aho, Kernighan & Weinberger (archive.org link)
Effective AWK, 4th edition by Robbins (see The GNU AWK Users Guide below for latest online version)
Effective AWK, 3rd edition by Robbins
Sed & Awk, 2nd edition by Dougherty & Robbins
Sed & Awk Pocket Reference, 2nd Edition by Arnold Robbins
AWK Language Programming - free book
Awk One-Liners Explained
GNU AWK one-liners by Sundeep Agarwal (includes a chapter on regular expressions)

Resources:

Awk.Info (archive.org link)
The GNU Awk User's Guide
POSIX specification of awk
Idiomatic awk
The awk programming language tutorial site
Awk one-liners
Awk one-liners explained

Other StackExchange Resources:

Related tags:

gawk (GNU's version of awk)
nawk (A very old, pre-POSIX version also from AT&T)
mawk (A different interpreter written by Mike Brennan)
sed (A kindred tool often mentioned in the same breath)

32722 questions

votes

4 answers

Reverse Geocoding in Bash using GPS Position from exiftool

I am writing a bash script that renames JPG files based on their EXIF tags. My original files are named like this: IMG_2110.JPG IMG_2112.JPG IMG_2113.JPG IMG_2114.JPG I need to rename them like…

bash google-maps awk gps reverse-geocoding

asked Sep 30 '15 at 20:43

utt50

votes

3 answers

Parsing a .csv-like file in bash

I have a file formatted as follows: string1,string2,string3,... ... I have to analyze the second column, counting the occurrences of each string, and producing a file formatted as follows: "number of occurrences of x",x "number of occurrences of…

regex bash csv awk gawk

asked Sep 08 '15 at 18:11

Luca

votes

3 answers

Grep whole paragraphs of a text containing a specific keyword

My goal is to extract the paragraphs of a text that contain a specific keyword. Not just the lines that contain the keyword, but the whole paragraph. The rule imposed on my text files is that every paragraph starts with a certain pattern (e.g. Pa0)…

text awk grep paragraph

asked Sep 03 '15 at 15:28

Kyriakos P.

votes

1 answer

Shell script to show frequency of each word in file and in a directory

I came across a question in my interview Shell script to show frequency of each word in file and in a directory A - A1 - File1.txt - File2.txt -A2 - FileA21.txt -A3 - FileA31.txt - FileA32.txt B …

bash shell awk

asked Aug 31 '15 at 16:49

user3624000

votes

1 answer

How to run awk -F\' '{print $2}' inside subprocess.Popen in Python?

I need to run a shell command inside subprocess.Popen in Python. The command is: $ virsh dumpxml server1 | grep 'source file' | awk -F\' '{print $2}' The output is: /vms/onion.qcow2 I'm having two challenges with the above command: 1) The command is…

python shell python-2.7 awk subprocess

asked Aug 09 '15 at 02:39

GreenTeaTech

votes

2 answers

AWK: Comparing two different columns in two files

I have these two files File1: 9 8 6 8 5 2 2 1 7 0 6 1 3 2 3 4 4 6 File2: (which has over 4 million lines) MN 1 0 JK 2 0 AL 3 90 CA 4 83 MK 5 54 HI 6 490 I want to compare field 6 of file1, and compare field 2 of file 2. If they match, then put…

awk

asked Jul 30 '15 at 23:11

adrotter

votes

3 answers

sed substitution including newlines

I want to change a text file so that any line beginning with "Length:" is appended to the previous line. I'm aware that sed '/\nLength:/ Length:/' isn't going to work because sed is line based. Googling for "How to match newlines in sed" did turn up…

regex awk sed

asked Jul 13 '15 at 08:58

Dave Rove

votes

2 answers

extracting the column using AWK

I am trying to extract column using AWK. Source file is a .CSV file and below is command I am using: awk -F ',' '{print $1}' abc.csv > test1 Data in file abc.csv is like below: xyz@yahoo.com,160,1,2,3 abc@ymail.com,1,2,3,160 But data obtained in…

linux awk

asked Jul 11 '15 at 21:02

Abhinav

votes

1 answer

How to awk every nth line starting from different lines each iteration

I would like awk to print every nth line out of a file starting from line 0. Then, after awk has gone through the whole file, I would like it to print every nth line starting from line 1...then print every nth line starting from line 2...etc, up to…

awk

asked Jul 11 '15 at 03:50

Jack_Bandit

votes

4 answers

Check if nth bit is set in bash

I'm wondering if there's a way to replace the if statement with something that checks whether $2 has the 7th bit set to 1? cat $file | awk '{if ($2 == 87) print $1; else {}}' > out.txt" For instance, 93 should print something whereas 128 should…

bash awk binary

asked Jul 02 '15 at 08:35

jyu429

votes

2 answers

awk search column from one file, if match print columns from both files

I'm trying to compare column 1 from file1 and column 3 from file 2, if they match then print the first column from file1 and the two first columns from file2. here's a sample from each file: file1 Cre01.g000100 Cre01.g000500 Cre01.g000650 …

regex awk compare match multiple-columns

asked Jul 01 '15 at 18:02

Luke Anderson- Trocme

votes

3 answers

grep and tail -f for a UTF-16 binary file - trying to use simple awk

How can I achieve the equivalent of: tail -f file.txt | grep 'regexp' to only output the buffered lines that match a regular expression such as 'Result' from the file type: $ file file.txt file.txt:Little-endian UTF-16 Unicode text, with CRLF line…

awk grep cygwin utf-16 tail

asked Jun 23 '15 at 22:20

Alexander McFarlane

10,643
9
59
100

votes

2 answers

How to split a CSV file into multiple files based on column value

I have CSV file which could look like this: name1;1;11880 name2;1;260.483 name3;1;3355.82 name4;1;4179.48 name1;2;10740.4 name2;2;1868.69 name3;2;341.375 name4;2;4783.9 there could more or less rows and I need to split it into multiple .dat files…

bash csv awk

asked Jun 17 '15 at 19:09

user3616643

votes

1 answer

Detecting corrupt characters in UTF-8 encoded text file

I have a text file that was edited with the wrong character encoding and thus has some mojibake and corrupt characters in some of the strings when I open it using UTF-8. What scripting language would be the most efficient at detecting these corrupt…

regex encoding awk utf-8 scripting

asked Jun 09 '15 at 17:30

user2056389

votes

3 answers

Linux awk merge two files

I have below script to combine two files. awk -F"\t" ' {key = $1} !(key in result) {result[key] = $0; next;} { for (i=2; i <= NF; i++) result[key] = result[key] FS $i } END { PROCINFO["sorted_in"] = "@ind_str_asc" # if…

linux awk merge

asked Jun 05 '15 at 21:36

clear.choi

Prev 1 2 3

…

99 100 Next