Questions tagged [awk]

AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. AWK is used largely with Unix systems.

AWK is an interpreted programming language (AWK stands for Aho, Weinberger, and Kernighan) designed for text processing and typically used as data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.

Source: Wikipedia.

An awk program is a series of pattern-action pairs, written as:

condition { action }
condition { action }
...

where condition is typically an expression and action a series of one or more commands, separated by a semi-colon ; character. The input is split into records, and each record is split into fields (by default, records are separated by the newline character and fields by horizontal whitespace.) Per record, each condition is checked and, if true, the commands in the action block are executed. Within the action block, fields are accessed by a 1-based index – e.g. $2 for the second field. If the condition is missing, the action block will always be executed. If the condition is present but the action block is absent, the default action is print $0 which is to print the current line after any transformations. Since a non-zero number is equivalent to true, then awk '1' file instructs awk to perform the default action (print) for every line.

Awk can have an optional BEGIN and optional END, where the BEGIN action is invoked before reading any input, and END action is invoked after all input is read:

BEGIN     { action } 
condition { action }
condition { action }
...
END       { action }

Awk was originally developed by Alfred Aho, Brian Kernighan and Peter Weinberger in 1977 and updated in 1985. Since then, various versions and dialects of awk have emerged. The most common are :

  • awk - the most common and will be found on most Unix-like systems. It also has a well defined IEEE standard.
  • mawk - a fast AWK implementation which it's code base is based on a byte-code interpreter.
  • nawk - during the development of AWK, the developers released a new version (new awk) to avoid confusion but it is itself now very old and lacking functionality present in all POSIX awks.
  • gawk - Also known as GNU awk. The only version in which the developers attempted to add i18n support. Allowed users to write their own C shared libraries to extend it with their own "plug-ins". This version is the standard implementation for Linux.

When asking questions about data processing using awk, please include complete input and desired output.

Some frequently occurring themes:

Books:

Resources:

Other StackExchange Resources:

Related tags:

  • (GNU's version of awk)
  • (A very old, pre-POSIX version also from AT&T)
  • (A different interpreter written by Mike Brennan)
  • (A kindred tool often mentioned in the same breath)
32722 questions
5
votes
2 answers

Matching last occurrence before a previous match

How do I match the last occurrence of foo before the match of some number? foo: A 1 2 foo: B 1 foo: C 2 A search for pattern 2 should return: foo: A foo: C
Christopher Markieta
  • 5,674
  • 10
  • 43
  • 60
5
votes
2 answers

Using the first field in AWK as file name

The dataset is one big file with three columns: An ID of a section, something irrelevant and a line of text. An example could look like the following: A01 001 This is a simple test. A01 002 Just for exemplary purpose. A01 003 A02 001 This is another…
beyeran
  • 885
  • 1
  • 8
  • 26
5
votes
2 answers

How to awk or grep a variable, without using echo?

I'm working with large variables and to speedup my script I'd like to awk or grep a variable without using echo/printf what I've tried: awk "/test/ {for(i=1; i<=100; i++) {getline; print}}" "$var" awk: fatal: cannot open file `<<$var content>>' for…
Orlo
  • 828
  • 2
  • 11
  • 28
5
votes
2 answers

Is there way to delete duplicate header in a file in Unix?

How can I delete multiple headers from a file? I tried to use the below code after finding it from How can I delete duplicate lines in a file in Unix?. awk '!x[$0]++' file.txt It is deleting all the duplicate records in the file. But in my case, I…
Dhruuv
  • 343
  • 10
  • 24
5
votes
3 answers

Move column to last in awk

I would like to move a specified column (the 2nd) to the last column position. I have multiple large tab-delimited files containing variable numbers of columns and rows. But, column 2 in all needs to be last. Another way to put it is that I want…
user3212388
  • 51
  • 1
  • 2
5
votes
2 answers

Convert massive MySQL dump file to CSV

I tryed something like this awk -F " " '{if($1=="INSERT"){print $5}}' input.sql | \ sed -e "s/^(//g" -e "s/),(/\n/g" -e "s/['\"]//g" \ -e "s/);$//g" -e "s/,/;/g" > output.txt But I find it slow and unoptimized A MySQL dump file looks…
Syffys
  • 570
  • 5
  • 21
5
votes
4 answers

Bash: how to find and break up long lines by inserting continuation character and newline?

I know how to find long lines in a file, using awk or sed: $ awk 'length<=5' foo.txt will print only lines of length <= 5. sed -i '/^.\{5,\}$/d' FILE would delete all lines with more than 5 characters. But how to find long lines and then break…
mort
  • 12,988
  • 14
  • 52
  • 97
5
votes
6 answers

Replace line after match

Given this file $ cat foo.txt AAA 111 BBB 222 CCC 333 I would like to replace the first line after BBB with 999. I came up with this command awk '/BBB/ {f=1; print; next} f {$1=999; f=0} 1' foo.txt but I am curious to any shorter commands with…
Zombo
  • 1
  • 62
  • 391
  • 407
5
votes
4 answers

How to count the number of instances of entries in column 1 and print the value to a new column

I have a tab delimited file that looks like the following: cluster.1 Adult.1 cluster.2 Comp.1 cluster.3 Adult.2 cluster.3 Pre.3 cluster.4 Pre.1 cluster.4 Juv.2 cluster.4 Comp.4 cluster.4 Adult.3 cluster.5 Adult.2 cluster.6 …
acalcino
  • 315
  • 1
  • 4
5
votes
3 answers

Search replace string in a file based on column in other file

If we have the first file like below: (a.txt) 1 asm 2 assert 3 bio 4 Bootasm 5 bootmain 6 buf 7 cat 8 console 9 defs 10 echo and the second like: (b.txt) bio cat BIO bootasm bio defs cat Bio console bio BiO bIo assert …
5
votes
5 answers

How to output counts for list of active/inactive inputs?

I have this input file (1=active, 0=inactive) a 1 a 0 b 1 b 1 b 0 c 0 c 0 c 0 c 0 . . . And want output like this: X repeats active count inactive count a 2 times …
mahmoud
  • 87
  • 5
5
votes
2 answers

Unix: Count occurrences of similar entries in first column, sum the second column

I have a file with two columns of data, I would like to count the occurrence of similarities in the first column. When two similar entries in the first column are matched, I would like to also sum the value of the second column of the two matched…
5
votes
3 answers

Getting average per line

I have a large data set in this format HF TLLL A T 0.999 NA 0.666 NA 0.566 NA NA 0.87 HF TLLM A T 0.500 0.500 0.666 0.566 NA NA 0.87 I want to calculate an average for each line, starting from column 5 until end of line, and ignoring the string NA.…
user1308144
  • 475
  • 1
  • 3
  • 13
5
votes
3 answers

how to print tail of path filename using awk

I've searched it with no success. I have a file with pathes. I want to print the tail of a all pathes. for example (for every line in file): /homes/work/abc.txt --> abc.txt Does anyone know how to do it? Thanks
Noam Mizrachi
  • 512
  • 2
  • 6
  • 15
5
votes
8 answers

use grep to extract multiple values from one line

file: timestamp1 KKIE ABC=123 [5454] GHI=547 JKL=877 MNO=878 timestamp2 GGHI ABC=544 [ 24548] GHI=883 JKL=587 MNO=874 timestamp3 GGGIO ABC=877 [3487] GHI=77422 JKL=877 MNO=877 timestamp4 GGDI ABC=269 [ 1896] GHI=887 JKL=877…
blue_xylo
  • 117
  • 2
  • 7