Questions tagged [awk]

AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. AWK is used largely with Unix systems.

AWK is an interpreted programming language (AWK stands for Aho, Weinberger, and Kernighan) designed for text processing and typically used as data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.

Source: Wikipedia.

An awk program is a series of pattern-action pairs, written as:

condition { action }
condition { action }
...

where condition is typically an expression and action a series of one or more commands, separated by a semi-colon ; character. The input is split into records, and each record is split into fields (by default, records are separated by the newline character and fields by horizontal whitespace.) Per record, each condition is checked and, if true, the commands in the action block are executed. Within the action block, fields are accessed by a 1-based index – e.g. $2 for the second field. If the condition is missing, the action block will always be executed. If the condition is present but the action block is absent, the default action is print $0 which is to print the current line after any transformations. Since a non-zero number is equivalent to true, then awk '1' file instructs awk to perform the default action (print) for every line.

Awk can have an optional BEGIN and optional END, where the BEGIN action is invoked before reading any input, and END action is invoked after all input is read:

BEGIN     { action } 
condition { action }
condition { action }
...
END       { action }

Awk was originally developed by Alfred Aho, Brian Kernighan and Peter Weinberger in 1977 and updated in 1985. Since then, various versions and dialects of awk have emerged. The most common are :

  • awk - the most common and will be found on most Unix-like systems. It also has a well defined IEEE standard.
  • mawk - a fast AWK implementation which it's code base is based on a byte-code interpreter.
  • nawk - during the development of AWK, the developers released a new version (new awk) to avoid confusion but it is itself now very old and lacking functionality present in all POSIX awks.
  • gawk - Also known as GNU awk. The only version in which the developers attempted to add i18n support. Allowed users to write their own C shared libraries to extend it with their own "plug-ins". This version is the standard implementation for Linux.

When asking questions about data processing using awk, please include complete input and desired output.

Some frequently occurring themes:

Books:

Resources:

Other StackExchange Resources:

Related tags:

  • (GNU's version of awk)
  • (A very old, pre-POSIX version also from AT&T)
  • (A different interpreter written by Mike Brennan)
  • (A kindred tool often mentioned in the same breath)
32722 questions
38
votes
2 answers

Remove quotes in awk command

I have a text file that needs to be processed using…
shantanuo
  • 31,689
  • 78
  • 245
  • 403
38
votes
7 answers

Finding gaps in sequential numbers

I don’t do this stuff for a living so forgive me if it’s a simple question (or more complicated than I think). I‘ve been digging through the archives and found a lot of tips that are close but being a novice I’m not sure how to tweak for my needs…
Shaun
  • 401
  • 1
  • 4
  • 4
38
votes
3 answers

Integer division in awk

I want to divide two numbers in awk, using integer division, i.e truncating the result. For example k = 3 / 2 print k should print 1 According to the manual, Division; because all numbers in awk are floating-point numbers, the result is not…
user000001
  • 32,226
  • 12
  • 81
  • 108
38
votes
5 answers

Removing Windows newlines on Linux (sed vs. awk)

Have some delimited files with improperly placed newline characters in the middle of fields (not line ends), appearing as ^M in Vim. They originate from freebcp (on Centos 6) exports of a MSSQL database. Dumping the data in hex shows \r\n…
kermatt
  • 1,585
  • 2
  • 16
  • 36
37
votes
3 answers

Escaping separator within double quotes, in awk

I am using awk to parse my data with "," as separator as the input is a csv file. However, there are "," within the data which is escaped by double quotes ("..."). Example filed1,filed2,field3,"field4,FOO,BAR",field5 How can i ignore the comma ","…
joomanji
  • 473
  • 1
  • 4
  • 6
37
votes
7 answers

How to grep the last occurrence of a line pattern

I have a file with contents x a x b x c I want to grep the last occurrence, x c when I try sed -n "/x/,/b/p" file it lists all the lines, beginning x to c.
user3702858
  • 381
  • 1
  • 3
  • 3
37
votes
6 answers

Better way of getting a GIT commit message by short hash?

I am currently getting my commit message for a certain commit hash by using this below: hash='b55da97' git log --pretty=oneline ${hash} | grep "${hash}" | awk '{ print $2 }' These seems extremely inefficient though. Is there a smarter or cheaper…
ehime
  • 8,025
  • 14
  • 51
  • 110
37
votes
11 answers

remove ^M characters from file using sed

I have this line inside a file: ULNET-PA,client_sgcib,broker_keplersecurities ,KEPLER I try to get rid of that ^M (carriage return) character so I used: sed 's/^M//g' However this does remove everything after ^M: [root@localhost tmp]# vi…
SoSed
  • 375
  • 1
  • 3
  • 6
37
votes
3 answers

What is the easiest way to remove 1st and last line from file with awk?

I am learning awk/gawk. So recently I just try to solve any problem with it to gain more practice opportunities. My coworker asked a question yesterday, "how to remove first and last line from file" . I know that sed '1d;$d' file would work. …
Imagination
  • 596
  • 1
  • 5
  • 12
37
votes
1 answer

How to add a character at the end of each line with awk?

I would like to add character A at the end of each line in a text file. How can I do this with awk? 1AAB VBNM JHTF 2SDA Desired output 1AABA VBNMA JHTFA 2SDAA
user1676953
  • 371
  • 1
  • 3
  • 3
36
votes
5 answers

'grep +A': print everything after a match

I have a file that contains a list of URLs. It looks like below: file1: http://www.google.com http://www.bing.com http://www.yahoo.com http://www.baidu.com http://www.yandex.com .... I want to get all the records after: http://www.yahoo.com,…
B.Mr.W.
  • 18,910
  • 35
  • 114
  • 178
36
votes
7 answers

Parsing variables from config file in Bash

Having the following content in a file: VARIABLE1="Value1" VARIABLE2="Value2" VARIABLE3="Value3" I need a script that outputs the following: Content of VARIABLE1 is Value1 Content of VARIABLE2 is Value2 Content of VARIABLE3 is Value3 Any ideas?
KillDash9
  • 879
  • 1
  • 8
  • 21
36
votes
9 answers

mysqldump with db in a separate file

I'm writing a single line command that backups all databases into their respective names instead using of dumping all in one sql. Eg: db1 get saved to db1.sql and db2 gets saved to db2.sql So far, I'd gathered the following commands to retrieve…
resting
  • 16,287
  • 16
  • 59
  • 90
35
votes
4 answers

Sort logs by date field in bash

let's have 126 Mar 8 07:45:09 nod1 /sbin/ccccilio[12712]: INFO: sadasdasdas 2 Mar 9 08:16:22 nod1 /sbin/zzzzo[12712]: sadsdasdas 1 Mar 8 17:20:01 nod1 /usr/sbin/cron[1826]: asdasdas 4 Mar 9 06:24:01 nod1 /USR/SBIN/CRON[27199]: aaaasdsd …
Mejmo
  • 2,363
  • 9
  • 35
  • 54
35
votes
12 answers

Can awk deal with CSV file that contains comma inside a quoted field?

I am using awk to perform counting the sum of one column in the csv file. The data format is something like: id, name, value 1, foo, 17 2, bar, 76 3, "I am the, question", 99 I was using this awk script to count the sum: awk -F, '{sum+=$3} END…
maguschen
  • 765
  • 2
  • 8
  • 12