Questions tagged [awk]

AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. AWK is used largely with Unix systems.

AWK is an interpreted programming language (AWK stands for Aho, Weinberger, and Kernighan) designed for text processing and typically used as data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.

Source: Wikipedia.

An awk program is a series of pattern-action pairs, written as:

condition { action }
condition { action }
...

where condition is typically an expression and action a series of one or more commands, separated by a semi-colon ; character. The input is split into records, and each record is split into fields (by default, records are separated by the newline character and fields by horizontal whitespace.) Per record, each condition is checked and, if true, the commands in the action block are executed. Within the action block, fields are accessed by a 1-based index – e.g. $2 for the second field. If the condition is missing, the action block will always be executed. If the condition is present but the action block is absent, the default action is print $0 which is to print the current line after any transformations. Since a non-zero number is equivalent to true, then awk '1' file instructs awk to perform the default action (print) for every line.

Awk can have an optional BEGIN and optional END, where the BEGIN action is invoked before reading any input, and END action is invoked after all input is read:

BEGIN     { action } 
condition { action }
condition { action }
...
END       { action }

Awk was originally developed by Alfred Aho, Brian Kernighan and Peter Weinberger in 1977 and updated in 1985. Since then, various versions and dialects of awk have emerged. The most common are :

  • awk - the most common and will be found on most Unix-like systems. It also has a well defined IEEE standard.
  • mawk - a fast AWK implementation which it's code base is based on a byte-code interpreter.
  • nawk - during the development of AWK, the developers released a new version (new awk) to avoid confusion but it is itself now very old and lacking functionality present in all POSIX awks.
  • gawk - Also known as GNU awk. The only version in which the developers attempted to add i18n support. Allowed users to write their own C shared libraries to extend it with their own "plug-ins". This version is the standard implementation for Linux.

When asking questions about data processing using awk, please include complete input and desired output.

Some frequently occurring themes:

Books:

Resources:

Other StackExchange Resources:

Related tags:

  • (GNU's version of awk)
  • (A very old, pre-POSIX version also from AT&T)
  • (A different interpreter written by Mike Brennan)
  • (A kindred tool often mentioned in the same breath)
32722 questions
5
votes
3 answers

printing contents of variable to a specified line in outputfile with sed/awk

I have been working on a script to concatenate multiple csv files into a single, large csv. The csv's contain names of folders and their respective sizes, in a 2-column setup with the format "Size, Projectname" Example of a single csv…
JasonD
  • 51
  • 3
5
votes
2 answers

Why it's not printing the maximum score of individual player?

I want to use the awk utility to list the maximum score of individual player. This is my cricketer.txt file: Virat Kohli:30 Suresh Raina:90 Shikhar Dhawan:122 Virat Kohli:33 Shikhar Dhawan:39 Suresh Raina:10 Suresh Raina:44 MS Dhoni:101 MS…
Parth Mangukiya
  • 434
  • 3
  • 13
5
votes
1 answer

awk stumper: regex substitution within a field

I'm new to awk, and I can't seem to figure this one out. How can I substitute in a single field using a regular expression? In perl, I could assign the field of interest to a variable, then $myvar =~ s/foo/bar/g. Of course also in perl I have to do…
rockriver
  • 51
  • 1
  • 3
5
votes
4 answers

Sorting groups of lines

Say I have this list: sharpest tool in the shed im not the How can I order alphabetically by the non-indented lines and preserve groups of lines? The above should become: im not the sharpest tool in the shed Similar questions…
Nick Bull
  • 9,518
  • 6
  • 36
  • 58
5
votes
3 answers

Sort rows in csv file without header & first column

I've a CSV file containing records like below. id,h1,h2,h3,h4,h5,h6,h7 101,zebra,1,papa,4,dog,3,apple 102,2,yahoo,5,kangaroo,7,ape I want to sort rows into this file without header and first column. My output should like this. …
Priyanka
  • 169
  • 10
5
votes
2 answers

extracting specific lines from a text file

I have a data in my .txtfile as below, I want to extract the line that have value as 12 and copy it into new .txt file. I tried with sed but could get the result, any help would be appreciated . Thanks "944760 1939" 10 "944760 1940" 12 "946120…
saikiran
  • 59
  • 1
  • 1
  • 3
5
votes
5 answers

How to print a separator if value or two consecutive rows do not match for a column

I have input like following and I need to put a separator between the rows if the value of the third column between two rows is different. one two three four five six three seven eight nine ten elevel alpha beta ten gama tango charlie oscar…
monk
  • 1,953
  • 3
  • 21
  • 41
5
votes
4 answers

Regex or split in python for shell awk equivalent

I've a agent version file that I need to parse to get the application version details. The (example) contents of version file /opt/app_folder/agent_version.txt is as below: Version: 10.2.4.110 Pkg name: XXXX-10.2.4-Agent-Linux-x86_64 Revision:…
Marcos
  • 845
  • 3
  • 10
  • 21
5
votes
3 answers

how to add new line at the end with awk

I was searching and trying a lot of different approaches, but non of them really did what I need. Hopefully it was not asked million times before. I have this alias in my bashrc: alias temp='awk '\''{ printf ("%0.1f",$1/1000); }'\'' <…
fluffypuffy
  • 103
  • 1
  • 1
  • 6
5
votes
3 answers

SED or AWK script to replace multiple text

I am trying to do the following with a sed script but it's taking too much time. Looks like something I'm doing wrongly. Scenario: I've student records (> 1 million) in students.txt. In This file (each line) 1st 10 characters are student ID and next…
Dhanabalan
  • 572
  • 5
  • 19
5
votes
3 answers

Concatenate two columns of a text file

I have a tsv file like 1 2 3 4 5 ... a b c d e ... x y z j k ... How can I merge two contiguous columns, say the 2nd and the 3rd, to get 1 2-3 4 5 ... a b-c d e ... x y-z j k ... I need the…
Arch Stanton
  • 382
  • 5
  • 14
5
votes
6 answers

Replace multiple occurrences between two strings

I need to replace every character a between xx and zz with hello: #input a xxab abzz ca xxbczz aaa axxazza xxczzaxxczz #output a xxhellob hellobzz ca xxbczz aaa axxhellozza xxczzaxxczz This works for one pair, it doesn't work for more xx/zz pairs…
PesaThe
  • 7,259
  • 1
  • 19
  • 43
5
votes
3 answers

How can I check if a field is greater than a certain number if that field has a $ sign in front?

Given a file called employee.txt in the format (Firstname, Lastname, Salary) with space as the field separator: Foo Bar $1,000 First Last $5,550 Abc Def $3,000 Stack Overflow $6000 Help Please $4700 I want to print lines that have its third field…
5areductase
  • 269
  • 2
  • 10
5
votes
3 answers

How to split a string depends on a pattern in other column (UNIX environment)

I have a TAB file something like: V I 280 6 - VRSSAI N V 2739 7 - SAVNATA A R 203 5 - AEERR Q A 2517 7 - AQSTPSP S S 1012 5 - GGGSS L A 281 11 - …
5
votes
5 answers

awk concatenate strings till contain substring

I have a awk script from this example: awk '/START/{if (x) print x; x="";}{x=(!x)?$0:x","$0;}END{print x;}' file Here's a sample file with lines: $ cat file START 1 2 3 4 5 end 6 7 START 1 2 3 end 5 6 7 So I need to stop concatenating when…
d.ansimov
  • 2,131
  • 2
  • 31
  • 54