grep: Keeping lines that has specific string in certain column

Question

I am trying to pick out the lines that have certain value in certain column and save it to an output. I am trying to do this with grep. Is it possible?

My data is looks like this:

apple   5   abcdefd  ewdsf
peach   5   ewtdsfe  wtesdf
melon   1   ewtedf   wersdf
orange  3   qqqwetr  hredfg

I want to pick out lines that have value 5 on its 2nd column and save it to new outputfile.

apple   5   abcdefd  ewdsf
peach   5   ewtdsfe  wtesdf

I would appreciate for help!

Michaël Le Barbier · Answer 1 · 2018-08-02T17:17:24.707

It is probably possible with grep but the adequate tool to perform this operation is definitely awk. You can filter every line having 5 on the second column with

awk '$2 == 5'

Explanation

awk splits it inputs in records (usually a line) and fields (usually a column) and perform actions on records matching certain conditions. Here

awk '$2 == 5'

is a short form for

awk '$2 == 5 {print($0)}'

which translates to

For each record, if the second field ($2) is 5, print the full record ($0).

Variations

If you need to choose dynamically the key value used to filter your values, use the -v option of awk:

awk -v "key=5" '$2 == key {print($0)}'

If you need to keep the first line of the file because it contains a header to the table, use the NR variable that keeps track of the ordinal number of the current record:

awk 'NR == 1 || $2 == 5'

The field separator is a regular expression defining which text separates columns, it can be modified with the -F field. For instance, if your data were in a basic CSV file, the filter would be

awk -F", *" '$2 == 5'

Visit the awk tag wiki to find a few useful information to get started learning awk.

If input table would have a header line, how do we preserve it? — bapors, Aug 02 '18 at 09:19

score 4 · Answer 2 · answered Oct 01 '14 at 19:12

4

To print when the second field is 5 use: awk '$2==5' file

answered Oct 01 '14 at 19:12

Etan Reisner

77,877
8
106
148

score 0 · Answer 3 · answered Oct 01 '14 at 19:07

0

Give this a try:

grep '^[^\s]\+\s5.*$' file.txt

the pattern looks for start of line, followed by more than one non-space character, followed by space, followed by 5, follwed by any number of chars, followed by eol.

answered Oct 01 '14 at 19:07

Fordio

3,410
2
14
18

1

The `.*$` part is useless. – Michaël Le Barbier Oct 01 '14 at 19:35
You never expect the Spanish Inquisition! ☺ – Michaël Le Barbier Oct 01 '14 at 19:47

score 0 · Answer 4 · answered Oct 03 '14 at 04:56

You can get following command.

$ cat data.txt
apple   5   abcdefd  ewdsf
peach   5   ewtdsfe  wtesdf
melon   1   ewtedf   wersdf
orange  3   qqqwetr  hredfg
grape   55  kkkkkkk  aaaaaa

$ grep -E '[^ ]+ +5 .*' data.txt > output.txt

$ cat output.txt
apple   5   abcdefd  ewdsf
peach   5   ewtdsfe  wtesdf

You can get the answer only with grep command. But I strongly recommend you use awk command.

David C. Rankin · Answer 5 · 2014-10-01T19:12:02.083

The simple way to do it is:

grep '5' MyDataFile

The result:

apple   5   abcdefd  ewdsf
peach   5   ewtdsfe  wtesdf

To capture that in a new file:

grep '5' MyDataFile > newfile

Note: that will find a 5 anywhere in MyDataFile. To restrict to the second column, a short script is what would suit your needs. If you want to limit it to the second column only, then a quick script like the following will do. Usage: script number datafile:

#!/bin/bash

while read -r fruit num stuff || [ -n "$stuff" ]; do
    [ "$num" -eq "$1" ] && printf "%s  %s  %s\n" "$fruit" "$num" "$stuff"
done <"$2"

output:

$ ./fruit.sh 5 dat/mydata.dat

apple  5  abcdefd  ewdsf
peach  5  ewtdsfe  wtesdf

I'm looking to restrict it to the second column. Sorry, the example I posted was bad. There are numeric values in other columns. — user3557715, Oct 01 '14 at 19:12

grep: Keeping lines that has specific string in certain column

5 Answers5

Explanation

Variations