Questions tagged [awk]

AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. AWK is used largely with Unix systems.

AWK is an interpreted programming language (AWK stands for Aho, Weinberger, and Kernighan) designed for text processing and typically used as data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.

Source: Wikipedia.

An awk program is a series of pattern-action pairs, written as:

condition { action }
condition { action }
...

where condition is typically an expression and action a series of one or more commands, separated by a semi-colon ; character. The input is split into records, and each record is split into fields (by default, records are separated by the newline character and fields by horizontal whitespace.) Per record, each condition is checked and, if true, the commands in the action block are executed. Within the action block, fields are accessed by a 1-based index – e.g. $2 for the second field. If the condition is missing, the action block will always be executed. If the condition is present but the action block is absent, the default action is print $0 which is to print the current line after any transformations. Since a non-zero number is equivalent to true, then awk '1' file instructs awk to perform the default action (print) for every line.

Awk can have an optional BEGIN and optional END, where the BEGIN action is invoked before reading any input, and END action is invoked after all input is read:

BEGIN     { action } 
condition { action }
condition { action }
...
END       { action }

Awk was originally developed by Alfred Aho, Brian Kernighan and Peter Weinberger in 1977 and updated in 1985. Since then, various versions and dialects of awk have emerged. The most common are :

  • awk - the most common and will be found on most Unix-like systems. It also has a well defined IEEE standard.
  • mawk - a fast AWK implementation which it's code base is based on a byte-code interpreter.
  • nawk - during the development of AWK, the developers released a new version (new awk) to avoid confusion but it is itself now very old and lacking functionality present in all POSIX awks.
  • gawk - Also known as GNU awk. The only version in which the developers attempted to add i18n support. Allowed users to write their own C shared libraries to extend it with their own "plug-ins". This version is the standard implementation for Linux.

When asking questions about data processing using awk, please include complete input and desired output.

Some frequently occurring themes:

Books:

Resources:

Other StackExchange Resources:

Related tags:

  • (GNU's version of awk)
  • (A very old, pre-POSIX version also from AT&T)
  • (A different interpreter written by Mike Brennan)
  • (A kindred tool often mentioned in the same breath)
32722 questions
73
votes
4 answers

Print line numbers starting at zero using awk

Can anyone tell me how to print line numbers including zero using awk? Here is my input file stackfile2.txt when I run the below awk command I get actual_output.txt awk '{print NR,$0}' stackfile2.txt | tr " ", "," > actual_output.txt whereas my…
user790049
  • 1,154
  • 1
  • 9
  • 21
73
votes
9 answers

Calling an executable program using awk

I have a program in C that I want to call by using awk in shell scripting. How can I do something like this?
user2030431
  • 761
  • 1
  • 6
  • 8
72
votes
10 answers

Remove all text before colon

I have a file containing a certain number of lines. Each line looks like this: TF_list_to_test10004/Nus_k0.345_t0.1_e0.1.adj:PKMYT1 I would like to remove all before ":" character in order to retain only PKMYT1 that is a gene name. Since I'm not…
NewUsr_stat
  • 2,351
  • 5
  • 28
  • 38
71
votes
6 answers

Assigning system command's output to variable

I want to run the system command in an awk script and get its output stored in a variable. I've been trying to do this, but the command's output always goes to the shell and I'm not able to capture it. Any ideas on how this can be done? Example: $…
Sahas
  • 10,637
  • 9
  • 41
  • 51
70
votes
5 answers

How to UNCOMMENT a line that contains a specific string using Sed?

The lines in the file : -A INPUT -m state --state NEW -m tcp -p tcp --dport 2000 -j ACCEPT -A INPUT -m state --state NEW -m tcp -p tcp --dport 2001 -j ACCEPT -A INPUT -m state --state NEW -m tcp -p tcp --dport 2002 -j ACCEPT to comment out let's…
user3864928
70
votes
3 answers

How to get first n characters of each line in unix data file

I am trying to get the first 22 characters from a unix data file.Here is my data looks as below. First 12 characters is column 1 and next 10 characters is 2nd column. 000000000001199998000180000 DUMMY RAG # MFR NOT ST 1999980 …
Teja
  • 13,214
  • 36
  • 93
  • 155
69
votes
11 answers

Remove non-ASCII characters from CSV

I want to remove all the non-ASCII characters from a file in place. I found one solution with tr, but I guess I need to write back that file after modification. I need to do it in place with relatively good performance. Any suggestions?
Sujit
  • 2,403
  • 4
  • 30
  • 36
69
votes
8 answers

Trim leading and trailing spaces from a string in awk

I'm trying to remove leading and trailing space in 2nd column of the below input.txt: Name, Order   Trim, working cat,cat1 I have used the below awk to remove leading and trailing space in 2nd column but it is not working. What am I missing? awk -F,…
Marjer
  • 1,313
  • 6
  • 20
  • 31
68
votes
10 answers

Grep output with multiple Colors?

Is there an elegant method in bash for running grep against a text file with two or more patterns, and each pattern that matches is output in a different color? So a line that matches on MALE and AUGUST would output MALE in blue and AUGUST in…
Evil Genius
  • 1,015
  • 2
  • 10
  • 16
67
votes
6 answers

How to use awk to print lines where a field matches a specific string?

I have: 1 LINUX param1 value1 2 LINUXparam2 value2 3 SOLARIS param3 value3 4 SOLARIS param4 value4 I need awk to print all lines in which $2 is LINUX.
yael
  • 2,765
  • 10
  • 40
  • 48
67
votes
16 answers

Easiest way to extract the urls from an html page using sed or awk only

I want to extract the URL from within the anchor tags of an html file. This needs to be done in BASH using SED/AWK. No perl please. What is the easiest way to do this?
codaddict
  • 445,704
  • 82
  • 492
  • 529
67
votes
4 answers

How to remove a character at the end of each line in UNIX

I would like to remove comma , at the end of each line in my file. How can I do it other than using substring function in awk? Sample Input: SUPPLIER_PROC_ID BIGINT NOT NULL, BTCH_NBR INTEGER NOT NULL, …
Teja
  • 13,214
  • 36
  • 93
  • 155
67
votes
2 answers

How do I view all ignored patterns set with svn:ignore recursively in an SVN repository?

I see it is possible to view a list of properties set on every directory within an SVN repository using proplist and the -R flag (recursive) and -v flag (verbose): svn proplist -Rv This shows me all properties, such as svn:mime-type or…
stereoscott
  • 13,309
  • 4
  • 33
  • 34
65
votes
10 answers

How to extract last part of string in bash?

I have this variable: A="Some variable has value abc.123" I need to extract this value i.e abc.123. Is this possible in bash?
user710818
  • 23,228
  • 58
  • 149
  • 207
64
votes
7 answers

Create a dedicated folder for every zip files in a directory and extract zip files

If I choose a zip file and right click "extract here" a folder with the zip filename is created and the entire content of the zip file is extracted into it. However, I would like to convert several zip files via shell. But when I do unzip…
creativeDev
  • 1,113
  • 2
  • 13
  • 20