Questions tagged [awk]

AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. AWK is used largely with Unix systems.

AWK is an interpreted programming language (AWK stands for Aho, Weinberger, and Kernighan) designed for text processing and typically used as data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.

Source: Wikipedia.

An awk program is a series of pattern-action pairs, written as:

condition { action }
condition { action }
...

where condition is typically an expression and action a series of one or more commands, separated by a semi-colon ; character. The input is split into records, and each record is split into fields (by default, records are separated by the newline character and fields by horizontal whitespace.) Per record, each condition is checked and, if true, the commands in the action block are executed. Within the action block, fields are accessed by a 1-based index – e.g. $2 for the second field. If the condition is missing, the action block will always be executed. If the condition is present but the action block is absent, the default action is print $0 which is to print the current line after any transformations. Since a non-zero number is equivalent to true, then awk '1' file instructs awk to perform the default action (print) for every line.

Awk can have an optional BEGIN and optional END, where the BEGIN action is invoked before reading any input, and END action is invoked after all input is read:

BEGIN     { action } 
condition { action }
condition { action }
...
END       { action }

Awk was originally developed by Alfred Aho, Brian Kernighan and Peter Weinberger in 1977 and updated in 1985. Since then, various versions and dialects of awk have emerged. The most common are :

  • awk - the most common and will be found on most Unix-like systems. It also has a well defined IEEE standard.
  • mawk - a fast AWK implementation which it's code base is based on a byte-code interpreter.
  • nawk - during the development of AWK, the developers released a new version (new awk) to avoid confusion but it is itself now very old and lacking functionality present in all POSIX awks.
  • gawk - Also known as GNU awk. The only version in which the developers attempted to add i18n support. Allowed users to write their own C shared libraries to extend it with their own "plug-ins". This version is the standard implementation for Linux.

When asking questions about data processing using awk, please include complete input and desired output.

Some frequently occurring themes:

Books:

Resources:

Other StackExchange Resources:

Related tags:

  • (GNU's version of awk)
  • (A very old, pre-POSIX version also from AT&T)
  • (A different interpreter written by Mike Brennan)
  • (A kindred tool often mentioned in the same breath)
32722 questions
114
votes
8 answers

Tab separated values in awk

How do I select the first column from the TAB separated string? # echo "LOAD_SETTLED LOAD_INIT 2011-01-13 03:50:01" | awk -F'\t' '{print $1}' The above will return the entire line and not just "LOAD_SETTLED" as expected. Update: I need to…
shantanuo
  • 31,689
  • 78
  • 245
  • 403
112
votes
4 answers

Printing column separated by comma using Awk command line

I have a problem here. I have to print a column in a text file using awk. However, the columns are not separated by spaces at all, only using a single comma. Looks something like this: column1,column2,column3,column4,column5,column6 How would I…
user3364728
  • 1,147
  • 2
  • 7
  • 5
108
votes
11 answers

How to print all the columns after a particular number using awk?

On shell, I pipe to awk when I need a particular column. This prints column 9, for example: ... | awk '{print $9}' How can I tell awk to print all the columns including and after column 9, not just column 9?
Lazer
  • 90,700
  • 113
  • 281
  • 364
107
votes
4 answers

Select row and element in awk

I learned that in awk, $2 is the 2nd column. How to specify the ith line and the element at the ith row and jth column?
Tim
  • 1
  • 141
  • 372
  • 590
107
votes
5 answers

Using awk to remove the Byte-order mark

How would an awk script (presumably a one-liner) for removing a BOM look like? Specification: print every line after the first (NR > 1) for the first line: If it starts with #FE #FF or #FF #FE, remove those and print the rest
Boldewyn
  • 81,211
  • 44
  • 156
  • 212
106
votes
5 answers

How to print the number of characters in each line of a text file

I would like to print the number of characters in each line of a text file using a unix command. I know it is simple with powershell gc abc.txt | % {$_.length} but I need unix command.
vikas368
  • 1,408
  • 2
  • 10
  • 13
105
votes
10 answers

cut or awk command to print first field of first row

I am trying print the first field of the first row of an output. Here is the case. I just need to print only SUSE from this output. # cat /etc/*release SUSE Linux Enterprise Server 11 (x86_64) VERSION = 11 PATCHLEVEL = 2 Tried with cat…
user3331975
  • 2,647
  • 7
  • 28
  • 30
105
votes
12 answers

Split one file into multiple files based on delimiter

I have one file with -| as delimiter after each section...need to create separate files for each section using unix. example of input…
user1499178
  • 1,059
  • 2
  • 8
  • 3
104
votes
25 answers

How to decode URL-encoded string in shell?

I have a file with a list of user-agents which are encoded. E.g.: Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en I want a shell script which can read this file and write to a new file with decoded strings. Mozilla/5.0…
user785717
  • 1,245
  • 2
  • 9
  • 8
103
votes
8 answers

Turning multiple lines into one comma separated line

I have the following data in multiple lines: foo bar qux zuu sdf sdfasdf What I want to do is to convert them to one comma separated line: foo,bar,qux,zuu,sdf,sdfasdf What's the best unix one-liner to do that?
neversaint
  • 60,904
  • 137
  • 310
  • 477
102
votes
13 answers

how to use sed, awk, or gawk to print only what is matched?

I see lots of examples and man pages on how to do things like search-and-replace using sed, awk, or gawk. But in my case, I have a regular expression that I want to run against a text file to extract a specific value. I don't want to do…
Stéphane
  • 19,459
  • 24
  • 95
  • 136
99
votes
11 answers

find difference between two text files with one item per line

I have two files: file 1 dsf sdfsd dsfsdf file 2 ljljlj lkklk dsf sdfsd dsfsdf I want to display what is in file 2 but not in file 1, so file 3 should look like ljljlj lkklk
vehomzzz
  • 42,832
  • 72
  • 186
  • 216
98
votes
6 answers

How to add to the end of lines containing a pattern with sed or awk?

Here is example file: somestuff... all: thing otherthing some other stuff What I want to do is to add to the line that starts with all: like this: somestuff... all: thing otherthing anotherthing some other stuff
yasar
  • 13,158
  • 28
  • 95
  • 160
98
votes
17 answers

grep for multiple strings in file on different lines (ie. whole file, not line based search)?

I want to grep for files containing the words Dansk, Svenska or Norsk on any line, with a usable returncode (as I really only like to have the info that the strings are contained, my one-liner goes a little further then this). I have many files…
Christian
  • 981
  • 1
  • 7
  • 3
96
votes
12 answers

Insert multiple lines into a file after specified pattern using shell script

I want to insert multiple lines into a file using shell script. Let us consider my input file contents are: input.txt: abcd accd cdef line web Now I have to insert four lines after the line 'cdef' in the input.txt file. After inserting my file…
user27
  • 1,557
  • 4
  • 16
  • 16