using bash (sed/awk) to extract rows AND columns in CSV files?

Question

Is bash capable of handling extracting rows and columns from csv files? Hoping I don't have to resort to python..

My 5-column csv file looks like:

Rank,Name,School,Major,Year
1,John,Harvard,Computer Science,3
2,Bill,Yale,Political Science,4
3,Mark,Stanford,Biology,1
4,Jane,Princeton,Electrical Engineering,3
5,Alex,MIT,Management Economics,2

I only want to extract the 3rd, 4th, and 5th column contents, ignoring the first row, so output looks like:

Harvard,Computer Science,3
Yale,Political Science,4
Stanford,Biology,1
Princeton,Electrical Engineering,3
MIT,Management Economics,2

So far I can only get awk to print out either each row, or each column of my CSV file, but not specific cols/rows like this case! Can bash do this?

it's odd that you're struggling to get awk to do this since printing fields (columns) and rows (records) is the most basic thing awk that is designed to do. Makes me think there must be more to this than you've described so far.... — Ed Morton, Jan 24 '13 at 06:28

score 19 · Answer 1 · answered Jan 24 '13 at 01:55

19

awk -F, 'NR > 1 { print $3 "," $4 "," $5 }'

NR is the current line number, while $3, $4 and $5 are the fields separated by the string given to -F

answered Jan 24 '13 at 01:55

that other guy

116,971
11
170
194

8

You can set `OFS=','` so that you don't have to concatenate commas in the print. – jordanm Jan 24 '13 at 02:02

hennr · Answer 2 · 2015-05-08T17:26:14.107

9

Try this:

tail -n+2 file.csv | cut --delimiter=, -f3-5

edited May 08 '15 at 17:26

answered Oct 02 '13 at 09:54

hennr

2,632
2
23
26

2

Most simple and elegant solution yet. – Dwight Spencer May 21 '14 at 21:23

score 6 · Answer 3 · answered Jan 24 '13 at 01:53

6

Use cut and tail:

tail -n +2 file.txt | cut -d ',' -f 3-

answered Jan 24 '13 at 01:53

Rubens

14,478
11
63
92

2

The OP wanted to skip the first line, that's why we used `tail`. – Rubens Oct 02 '13 at 16:27

score 6 · Accepted Answer · answered Jan 24 '13 at 03:42

6

Bash solutions;

Using IFS

#!/bin/bash
while IFS=',' read -r rank name school major year; do
    echo -e "Rank\t: $rank\nName\t: $name\nSchool\t: $school\nMajor\t: $major\nYear\t: $year\n"
done < file.csv
IFS=$' \t\n'

Using String Manipulation and Arrays

#!/bin/bash
declare -a arr
while read -r line; do
    arr=(${line//,/ })
    printf "Rank\t: %s\nName\t: %s\nSchool\t: %s\nMajor\t: %s\nYear\t: %s\n" ${arr[@]}
done < file.csv

answered Jan 24 '13 at 03:42

koola

1,616
1
13
15

Fairly unwieldy, but I like the use of arrays which I will probably refer to again at some point. Not to mention it's a bash-only solution. – icedwater Nov 18 '13 at 16:25
this fails to ignore commas in quotes. example csv line: `"some, text",1,2` will be parsed as: `some`, `text`, `1`, `2` instead of `some text`, `1`, `2` – gondo Feb 16 '15 at 15:06

score 3 · Answer 5 · answered Jan 24 '13 at 04:34

3

sed 1d file.csv | while IFS=, read first second rest; do echo "$rest"; done

answered Jan 24 '13 at 04:34

glenn jackman

238,783
38
220
352

score 2 · Answer 6 · answered Jan 24 '13 at 01:57

2

Here you go, a simple AWK program.

#!/usr/bin/awk -f

BEGIN {
    # set field separator to comma to split CSV fields
    FS = ","
}

# NR > 1 skips the first line
NR > 1 {
    # print only the desired fields
    printf("%s,%s,%s\n", $3, $4, $5)
}

answered Jan 24 '13 at 01:57

steveha

74,789
21
92
117

5

If you set `OFS=","`, you can simply write `print $3, $4, $5` – glenn jackman Jan 24 '13 at 04:31

score 2 · Answer 7 · answered Jan 24 '13 at 07:27

2

perl -F, -lane 'if($.!=1){print join ",",@F[2,3,4];}' your_file

check here

answered Jan 24 '13 at 07:27

Vijay

65,327
90
227
319

score 2 · Answer 8 · answered Jan 24 '13 at 07:42

2

This might work for you (GNU sed):

sed -r '1d;s/([^,]*,){2}//' file

answered Jan 24 '13 at 07:42

potong

55,640
6
51
83

score 2 · Answer 9 · edited Sep 21 '15 at 15:10

2

try this

awk -F, 'NR > 1 { OFS=",";print $3, $4, $5 }' temp.txt

or this

sed -re '1d;s/^[0-9],\w+,//g' temp.txt

edited Sep 21 '15 at 15:10

whoan

8,143
4
39
48

answered Jan 24 '13 at 12:01

Mirage

30,868
62
166
261

Can you provide some explanation of what your fixes are doing. – Jon Egerton Jan 24 '13 at 12:26
@JonEgerton , in the awk i added the OFS and in sed i made more clearer so that the new users can see what i am matching . in previous answers regex are shhort but hard to comprehend for new user of regex. mine may not be perfect but at least are visible what they are doing. and they work – Mirage Jan 24 '13 at 13:02

score 1 · Answer 10 · answered Apr 11 '13 at 14:02

1

I have created package for this kind of tasks - gumba If you feel comfortable with coffeescript you can give it a try

cat file.csv | tail -n +2 | \
gumba "words(',').take((words)-> words.last(3)).join(',')"`

answered Apr 11 '13 at 14:02

welldan97

3,071
2
24
28

score 0 · Answer 11 · edited Sep 21 '15 at 15:09

grep '^,' outlook.contacts.csv | sed 's/^,\([^,]*\),[^,]*,\([^,]*\),.*/\1 \2/'

Get all lines that starts with a , then using sed to replace blank fields with first and second name.

Be careful for some reason once you paste it changes the line to this so maybe you better to carefully do it manually.

grep '^,' outlook.contacts.csv | sed 's/^,([^,]),[^,],([^,]),./\1 \2/'

using bash (sed/awk) to extract rows AND columns in CSV files?

11 Answers11

Linked

Related