Questions tagged [awk]

AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. AWK is used largely with Unix systems.

AWK is an interpreted programming language (AWK stands for Aho, Weinberger, and Kernighan) designed for text processing and typically used as data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.

Source: Wikipedia.

An awk program is a series of pattern-action pairs, written as:

condition { action }
condition { action }
...

where condition is typically an expression and action a series of one or more commands, separated by a semi-colon ; character. The input is split into records, and each record is split into fields (by default, records are separated by the newline character and fields by horizontal whitespace.) Per record, each condition is checked and, if true, the commands in the action block are executed. Within the action block, fields are accessed by a 1-based index – e.g. $2 for the second field. If the condition is missing, the action block will always be executed. If the condition is present but the action block is absent, the default action is print $0 which is to print the current line after any transformations. Since a non-zero number is equivalent to true, then awk '1' file instructs awk to perform the default action (print) for every line.

Awk can have an optional BEGIN and optional END, where the BEGIN action is invoked before reading any input, and END action is invoked after all input is read:

BEGIN     { action } 
condition { action }
condition { action }
...
END       { action }

Awk was originally developed by Alfred Aho, Brian Kernighan and Peter Weinberger in 1977 and updated in 1985. Since then, various versions and dialects of awk have emerged. The most common are :

  • awk - the most common and will be found on most Unix-like systems. It also has a well defined IEEE standard.
  • mawk - a fast AWK implementation which it's code base is based on a byte-code interpreter.
  • nawk - during the development of AWK, the developers released a new version (new awk) to avoid confusion but it is itself now very old and lacking functionality present in all POSIX awks.
  • gawk - Also known as GNU awk. The only version in which the developers attempted to add i18n support. Allowed users to write their own C shared libraries to extend it with their own "plug-ins". This version is the standard implementation for Linux.

When asking questions about data processing using awk, please include complete input and desired output.

Some frequently occurring themes:

Books:

Resources:

Other StackExchange Resources:

Related tags:

  • (GNU's version of awk)
  • (A very old, pre-POSIX version also from AT&T)
  • (A different interpreter written by Mike Brennan)
  • (A kindred tool often mentioned in the same breath)
32722 questions
5
votes
4 answers

Extract using sed or grep

I am fairly new to grep and sed commands.How can +50.0 be extracted from Core 0: +50.0°C (high = +80.0°C, crit = +90.0°C) using grep or sed in bash script? acpitz-virtual-0 Adapter: Virtual device temp1: +50.0°C (crit =…
curious_coder
  • 2,392
  • 4
  • 25
  • 44
5
votes
3 answers

Using the bash sort command within variable-length filenames

I am trying to numerically sort a series of files output by the ls command which match the pattern either ABCDE1234A1789.RST.txt or ABCDE12345A1789.RST.txt by the '789' field. In the example patterns above, ABCDE is the same for all files, 1234 or…
5
votes
2 answers

Join multiple tables by row names

I would like to merge multiple tables by row names. The tables differ in the amount of rows and they have unique and shared rows, which should all appear in output. If possible I would like to solve the problem with awk, but I am also fine with…
user2715173
  • 53
  • 1
  • 3
5
votes
5 answers

Awk between two patterns with pattern in the middle

Hi i am looking for an awk that can find two patterns and print the data between them to a file only if in the middle there is a third patterns in the middle. for example: Start 1 2 middle 3 End Start 1 2 End And the output will…
Ggdw
  • 2,509
  • 5
  • 24
  • 22
5
votes
1 answer

How to specify *one* tab as field separator in AWK?

The default for white-space field separators, such as tab when using FS = "\t", in AWK is either one or many. Therefore, if you want to read in a tab separated file with null values in some columns (other than the last), it skips over them. For…
user2662766
  • 51
  • 1
  • 1
  • 2
5
votes
3 answers

Combining columns within a single file using awk

I am trying to reformat a large file. The first 6 columns of each line are OK but the rest of the columns in the line need to be combined in increments of 2 with a "/" character in between. Example file (showing only a few columns but have many…
KBoehme
  • 361
  • 2
  • 5
  • 16
5
votes
3 answers

finding duplicates in a field and printing them in unix bash

I have a file the contains apple apple banana orange apple orange I want a script that finds the duplicates apple and orange and tells the user that the following : apple and orange are repeated. I tried nawk '!x[$1]++' FS="," filename to find…
t28292
  • 573
  • 2
  • 7
  • 12
5
votes
2 answers

Replace fileds with AWK by using a different file as translation list

I am using awk in Windows. I have a script called test.awk. This script should read a file and replace a certain filed (key) with a value. The key->value list is in a file called translate.txt. It's structure is like this: e;Emil …
Schamas
  • 53
  • 1
  • 4
5
votes
2 answers

Awk - print next record following matched record

I'm trying to get a next field after matching field using awk. Is there an option to do that or do I need to scan the record into array then check each field in array and print the one after that? What I have so far: The file format…
stefanB
  • 77,323
  • 27
  • 116
  • 141
5
votes
4 answers

Filling in gaps with awk or anything

I have a list such as below, where the 1 column is position and the other columns aren't important for this question. 1 1 2 3 4 5 2 1 2 3 4 5 5 1 2 3 4 5 8 1 2 3 4 5 9 1 2 3 4 5 10 1 2 3 4 5 11 1 2 3 4 5 I want to fill in the gaps such that…
jeffpkamp
  • 2,732
  • 2
  • 27
  • 51
5
votes
3 answers

Extracting only my function names from ELF binary

Im writing a script for extracting all the functions (written by user) in a binary. The following shell script extracts my function names as well as some library functions which start with __. readelf -s ./a.out | gawk ' { if ($4 == "FUNC" && $3…
Jeyaram
  • 9,158
  • 7
  • 41
  • 63
5
votes
5 answers

What is platform independent way of converting csv files to tsv files if the csv files can be quoted with comma inside the quoted strings?

Suppose I have a csv file like this a,b,c 1,"drivingme,mad",2 and I want convert it to a TSV abc 1drivingme,mad2 Whilst I can write some Python code to do this. I found this to be slow. Is there a better awk, sed or perl way…
xiaodai
  • 14,889
  • 18
  • 76
  • 140
5
votes
7 answers

How to find files containing exactly 16 lines?

I have to find files that containing exactly 16 lines in Bash. My idea is: find -type f | grep '/^...$/' Does anyone know how to utilise find + grep or maybe find + awk? Then, Move the matching files another directory. Deleting all non-matching…
5
votes
2 answers

How to supress default print in awk?

This is with gawk 4.0.0, running on Windows 7 with cygwin. The program is invoked like gawk -f procjournal.gawk testdata I have some data that looks like this: "Date";"Type";"Amount";"Balance" "6/11/2013 11:51:17 AM";"Transaction…
wades
  • 927
  • 9
  • 24
5
votes
7 answers

Get common values in 2 arrays in shell scripting

I have an array1 = (20,30,40,50) array2 = (10,20,30,80,100,110,40) I have to get the common values from these 2 arrays in my array 3 like: array3 = (20,30,40) in ascending sorted order.
iaav
  • 484
  • 2
  • 9
  • 26