Questions tagged [awk]

AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. AWK is used largely with Unix systems.

AWK is an interpreted programming language (AWK stands for Aho, Weinberger, and Kernighan) designed for text processing and typically used as data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.

Source: Wikipedia.

An awk program is a series of pattern-action pairs, written as:

condition { action }
condition { action }
...

where condition is typically an expression and action a series of one or more commands, separated by a semi-colon ; character. The input is split into records, and each record is split into fields (by default, records are separated by the newline character and fields by horizontal whitespace.) Per record, each condition is checked and, if true, the commands in the action block are executed. Within the action block, fields are accessed by a 1-based index – e.g. $2 for the second field. If the condition is missing, the action block will always be executed. If the condition is present but the action block is absent, the default action is print $0 which is to print the current line after any transformations. Since a non-zero number is equivalent to true, then awk '1' file instructs awk to perform the default action (print) for every line.

Awk can have an optional BEGIN and optional END, where the BEGIN action is invoked before reading any input, and END action is invoked after all input is read:

BEGIN     { action } 
condition { action }
condition { action }
...
END       { action }

Awk was originally developed by Alfred Aho, Brian Kernighan and Peter Weinberger in 1977 and updated in 1985. Since then, various versions and dialects of awk have emerged. The most common are :

  • awk - the most common and will be found on most Unix-like systems. It also has a well defined IEEE standard.
  • mawk - a fast AWK implementation which it's code base is based on a byte-code interpreter.
  • nawk - during the development of AWK, the developers released a new version (new awk) to avoid confusion but it is itself now very old and lacking functionality present in all POSIX awks.
  • gawk - Also known as GNU awk. The only version in which the developers attempted to add i18n support. Allowed users to write their own C shared libraries to extend it with their own "plug-ins". This version is the standard implementation for Linux.

When asking questions about data processing using awk, please include complete input and desired output.

Some frequently occurring themes:

Books:

Resources:

Other StackExchange Resources:

Related tags:

  • (GNU's version of awk)
  • (A very old, pre-POSIX version also from AT&T)
  • (A different interpreter written by Mike Brennan)
  • (A kindred tool often mentioned in the same breath)
32722 questions
5
votes
5 answers

Using backticks or $() with xargs and sed or awk

Assuming I want to change some filenames that end with jpg.jpg to end only with .jpg (in bash), and I want to do it by piping the output of find to xargs: By using sed: find . -iname '*jpg.jpg' | xargs -I % mv -iv % $(echo % | sed…
nrz
  • 10,435
  • 4
  • 39
  • 71
5
votes
2 answers

awk and multilines matching (sub-regex)

I am trying to use awk to parse a multiline expression. A single one of them looks like this: _begin hello world ! _attrib0 123 _attrib1 super duper _attrib1 yet another value _attrib2 foo _end I need to extract the value associated to…
malat
  • 12,152
  • 13
  • 89
  • 158
5
votes
3 answers

Modify Sequence with snp position and output in same file

i have two files one with positions information and another is sequence information. Now i need to read the positions and take the snps at the positions and replace that position base with the snp information in the sequence and write it in the snp…
user630605
  • 167
  • 1
  • 4
  • 10
5
votes
3 answers

What does effect does a trailing number have on the body of an awk script?

I have a simple awk one liner that folds the next line onto the current line when a given pattern is matched. Here it is: awk '/two/ { printf $1; next; } 1' test.txt with the following input: one two three four five six one two three four you…
Damon Snyder
  • 1,362
  • 1
  • 11
  • 18
5
votes
5 answers

MySQL import from stdin

I am generating a csv in stdout using awk. Is there a way to directly import that contents in mysql without putting it to file?
Vivek Goel
  • 22,942
  • 29
  • 114
  • 186
5
votes
3 answers

Copy matching lines to a second file

I need to copy all lines in a file matching a pattern to a second file. In detail: I have a sql dump and want to create a second sql file which includes all commands for tables whose name matches dx_postings, dx_postings_archive, and so on. The…
ChrJantz
  • 919
  • 1
  • 11
  • 23
5
votes
10 answers

change lowercase file names to uppercase with awk ,sed or bash

I would like to change lowercase filenames to uppercase with awk/sed/bash your help would be appreciated aaaa.txt vvjv.txt acfg.txt desired output AAAA.txt VVJV.txt ACFG.txt
rebca
  • 1,179
  • 3
  • 10
  • 11
5
votes
1 answer

awk print vs printf functions

In awk there are two output functions: print and printf. Are their implementations in awk very different? What are the differences regarding performance/speed (if possible — theoretical, not only with "time" on command line)? Do they use the same…
static
  • 8,126
  • 15
  • 63
  • 89
5
votes
3 answers

awk multiple matching patterns

awk seems to match all the patterns matching an expression and executes the corresponding actions. Is there a precedence that can be associated ? For eg. In the below, lines starting with # (comments) are matched by both patterns, and both actions…
Kumar
  • 267
  • 5
  • 13
5
votes
2 answers

Big data read subsamples R

I'm most grateful for your time to read this. I have a uber size 30GB file of 6 million records and 3000 (mostly categorical data) columns in csv format. I want to bootstrap subsamples for multinomial regression, but it's proving difficult even with…
Yoda
  • 397
  • 5
  • 18
5
votes
1 answer

splitting a line according to the field separator as a string

I have a file as below: 10temp3 20/temp4 28 temp 5 I am using the below command for splitting the lines and get the last number in the line. awk -F"temp" '{print $NF}' temp3 the ouput i got is : > awk -F"temp" '{print $NF}'…
Vijay
  • 65,327
  • 90
  • 227
  • 319
5
votes
2 answers

subtract columns from different files with awk

I have two folders A1 and A2. The names and the number of files are same in these two folders. Each file has 15 columns. Column 6 of each file in folder 'A1' needs to substrate from the column 6 of each file in folder 'A2'. I would like to print…
user1588971
  • 55
  • 1
  • 1
  • 6
5
votes
4 answers

Does awk support dynamic user-defined variables?

awk supports this: awk '{print $(NF-1);}' but not for user-defined variables: awk '{a=123; b="a"; print $($b);}' by the way, shell supports this: a=123; b="a"; eval echo \${$b}; How can I achieve my purpose in awk?
fanlix
  • 1,248
  • 1
  • 13
  • 22
5
votes
4 answers

Add a column to any position in a file in unix [using awk or sed]

I'm looking for other alternatives/more intelligent 1 liner for following command, which should add a value to a requested column number. I tried following following sed command works properly for adding value 4 to the 4th column. [Need: As i have…
Mandar Pande
  • 12,250
  • 16
  • 45
  • 72
5
votes
1 answer

Running an awk by splitting the lines

This is such a basic question in awk . But I am facing issues in this and I dont know why. problem is when I run the awk command in a single line such as awk 'BEGIN {} {print $0;}' FILE Then the code is running perfecctly But if I split the code…
NandaKumar
  • 905
  • 4
  • 15
  • 19
1 2 3
99
100