Questions tagged [awk]

AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. AWK is used largely with Unix systems.

AWK is an interpreted programming language (AWK stands for Aho, Weinberger, and Kernighan) designed for text processing and typically used as data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.

Source: Wikipedia.

An awk program is a series of pattern-action pairs, written as:

condition { action }
condition { action }
...

where condition is typically an expression and action a series of one or more commands, separated by a semi-colon ; character. The input is split into records, and each record is split into fields (by default, records are separated by the newline character and fields by horizontal whitespace.) Per record, each condition is checked and, if true, the commands in the action block are executed. Within the action block, fields are accessed by a 1-based index – e.g. $2 for the second field. If the condition is missing, the action block will always be executed. If the condition is present but the action block is absent, the default action is print $0 which is to print the current line after any transformations. Since a non-zero number is equivalent to true, then awk '1' file instructs awk to perform the default action (print) for every line.

Awk can have an optional BEGIN and optional END, where the BEGIN action is invoked before reading any input, and END action is invoked after all input is read:

BEGIN     { action } 
condition { action }
condition { action }
...
END       { action }

Awk was originally developed by Alfred Aho, Brian Kernighan and Peter Weinberger in 1977 and updated in 1985. Since then, various versions and dialects of awk have emerged. The most common are :

  • awk - the most common and will be found on most Unix-like systems. It also has a well defined IEEE standard.
  • mawk - a fast AWK implementation which it's code base is based on a byte-code interpreter.
  • nawk - during the development of AWK, the developers released a new version (new awk) to avoid confusion but it is itself now very old and lacking functionality present in all POSIX awks.
  • gawk - Also known as GNU awk. The only version in which the developers attempted to add i18n support. Allowed users to write their own C shared libraries to extend it with their own "plug-ins". This version is the standard implementation for Linux.

When asking questions about data processing using awk, please include complete input and desired output.

Some frequently occurring themes:

Books:

Resources:

Other StackExchange Resources:

Related tags:

  • (GNU's version of awk)
  • (A very old, pre-POSIX version also from AT&T)
  • (A different interpreter written by Mike Brennan)
  • (A kindred tool often mentioned in the same breath)
32722 questions
5
votes
2 answers

awk regex start of line anchor matches whitespace

Parsing an input file through awk I ran into an issue with anchors in awk. Given the following file: 2015 2015 test test Output with awk $ awk '$1 ~ /^[0-9]/' file 2015 2015 Output with sed $ sed -n '/^[0-9]/p' file 2015 Can somebody explain…
sastorsl
  • 2,015
  • 1
  • 16
  • 17
5
votes
3 answers

unix awk column equal to a variable

I want to print the lines that with two columns equal to a variable, for example, input: 2607s1 NC_000067.6 95.92 49 1 1 3 50 1e-14 84.2 2607s1 NC_000067.6 97.73 44 1 0 7 50 4e-14 84.2 2607s1 NC_000067.6 97.67 43 1 0 …
once
  • 1,369
  • 4
  • 22
  • 32
5
votes
2 answers

Sorting directory contents (including hidden files) by name in the shell

Is there a nice way to sort directory contents (including hidden files) in the shell? Basically i'd like to be able to ls directories just its done in my GUI file manager. In a typical directory, the output is as…
Six
  • 5,122
  • 3
  • 29
  • 38
5
votes
2 answers

Select matrix first row based on 1st, 8th and 9th column value with awk or sed

I have some rows where the 1st, 8th and 9th columns are mostly the same. The total number of rows is well over 60K. Now I want to simplify keeping only the first rows for which the 1st,8th and 9th column are same. Input file: chr exon_start …
ivivek_ngs
  • 917
  • 3
  • 10
  • 28
5
votes
4 answers

Uniq in awk; removing duplicate values in a column using awk

I have a large datafile in the following format below: ENST00000371026 WDR78,WDR78,WDR78, WD repeat domain 78 isoform 1,WD repeat domain 78 isoform 1,WD repeat domain 78 isoform 2, ENST00000371023 WDR32 WD repeat domain 32 isoform…
D W
  • 2,979
  • 4
  • 34
  • 45
5
votes
2 answers

how to define a space in a regular expression (in awk)?

I want to print the texts inside of " ". for example I have the following strings: gfdg "jkfgh" "jkfd fdgj fd-" ghjhgj gfggf "kfdjfdgfhbg" "fhfghg" jhgj jhfjhg "dfgdf" fgf fgfdg "dfj jfdg jhfgjd" "hfgdh jfdhgd jkfghfd" hgjghj And I want to print…
h ketab
  • 107
  • 1
  • 2
  • 7
5
votes
6 answers

How to number the ls output in unix?

I am trying to write a file with format - "id file_absolute_path" which basically lists down all the files recursively in a folder and give an identifier to each file listed like 1,2,3,4. I can get the absolute path of the files recursively using…
Snehal
  • 7,266
  • 2
  • 32
  • 42
5
votes
1 answer

AWK print the column number where a match is found

I looked a lot of places for this answer, but could not find it. Still learning AWK here, and was just wondering how to print the column number where a match is found. I want the script to give me the field/column number where regexp match "/1" is…
adrotter
  • 301
  • 3
  • 10
5
votes
1 answer

Print lines from match until end of file with awk

In Print lines in file from the match line until end of file I learnt that you can use 0 in range expressions in awk to match until the end of the file: awk '/matched/,0' file However, today I started playing around this 0 and I found that this…
fedorqui
  • 275,237
  • 103
  • 548
  • 598
5
votes
1 answer

AWK use value of field in regex

I'm trying to find a string pattern composed of the word CONCLUSION followed by the value of field $2 and field $3 from the same record in field $5. For example, my_file.txt is separated by "|": 1|substance1|substance2|red|CONCLUSIONS: the effect of…
Hallucigeniak
  • 155
  • 1
  • 7
5
votes
1 answer

Awk print is not working inside bash shell script

When I use AWK print command outside shell it is working perfectly. Below is content of the file (sample.txt) which is comma separated. IROG,1245,OUTO,OTUG,USUK After, executing below command outside shell I get IROG as output. cat sample.txt | awk…
stephenjacob
  • 141
  • 2
  • 3
  • 14
5
votes
2 answers

How do I replace a substring by the output of a shell command with sed, awk or such?

I'd like to use sed or any command line tool to replace parts of lines by the output of shell commands. For example: Replace linux epochs by human-readable timestamps, by calling date Replace hexa dumps of a specific protocol packets by their…
Gabriel
  • 2,841
  • 4
  • 33
  • 43
5
votes
1 answer

Deleting a multiline block of text between regex pattern using sed

I have a duplicated block of text I need to delete in a large xml file. I want to keep the first block and delete the second all within the same xml tag. For example: -- several lines of text -- several lines of the same…
user2167052
  • 51
  • 1
  • 3
5
votes
1 answer

grep -v pattern and also remove 1 line before and 4 lines after

I would like to grep a pattern and remove the line of the matching pattern and also 1 line before and 4 lines after the context. I tried: grep -v -A 4 -B 1 Thanks in advance! Example: Rule: r1 Owner: Process explorer.exe Pid 1544 0x01ec350f 8b 45…
user3022917
  • 579
  • 2
  • 8
  • 20
5
votes
4 answers

how to remove first two words of a strings output

I want to remove the first two words that come up in my output string. this string is also within another string. What I have: for servers in `ls /data/field` do string=`cat /data/field/$servers/time` This sends this text: 00:00 down…
hisere
  • 53
  • 1
  • 1
  • 6