14

Is bash capable of handling extracting rows and columns from csv files? Hoping I don't have to resort to python..

My 5-column csv file looks like:

Rank,Name,School,Major,Year
1,John,Harvard,Computer Science,3
2,Bill,Yale,Political Science,4
3,Mark,Stanford,Biology,1
4,Jane,Princeton,Electrical Engineering,3
5,Alex,MIT,Management Economics,2

I only want to extract the 3rd, 4th, and 5th column contents, ignoring the first row, so output looks like:

Harvard,Computer Science,3
Yale,Political Science,4
Stanford,Biology,1
Princeton,Electrical Engineering,3
MIT,Management Economics,2

So far I can only get awk to print out either each row, or each column of my CSV file, but not specific cols/rows like this case! Can bash do this?

user1899415
  • 3,015
  • 7
  • 22
  • 31
  • it's odd that you're struggling to get awk to do this since printing fields (columns) and rows (records) is the most basic thing awk that is designed to do. Makes me think there must be more to this than you've described so far.... – Ed Morton Jan 24 '13 at 06:28

11 Answers11

19
awk -F, 'NR > 1 { print $3 "," $4 "," $5 }' 

NR is the current line number, while $3, $4 and $5 are the fields separated by the string given to -F

that other guy
  • 116,971
  • 11
  • 170
  • 194
9

Try this:

tail -n+2 file.csv | cut --delimiter=, -f3-5
hennr
  • 2,632
  • 2
  • 23
  • 26
6

Use cut and tail:

tail -n +2 file.txt | cut -d ',' -f 3-
Rubens
  • 14,478
  • 11
  • 63
  • 92
6

Bash solutions;

Using IFS

#!/bin/bash
while IFS=',' read -r rank name school major year; do
    echo -e "Rank\t: $rank\nName\t: $name\nSchool\t: $school\nMajor\t: $major\nYear\t: $year\n"
done < file.csv
IFS=$' \t\n'

Using String Manipulation and Arrays

#!/bin/bash
declare -a arr
while read -r line; do
    arr=(${line//,/ })
    printf "Rank\t: %s\nName\t: %s\nSchool\t: %s\nMajor\t: %s\nYear\t: %s\n" ${arr[@]}
done < file.csv
koola
  • 1,616
  • 1
  • 13
  • 15
  • Fairly unwieldy, but I like the use of arrays which I will probably refer to again at some point. Not to mention it's a bash-only solution. – icedwater Nov 18 '13 at 16:25
  • this fails to ignore commas in quotes. example csv line: `"some, text",1,2` will be parsed as: `some`, `text`, `1`, `2` instead of `some text`, `1`, `2` – gondo Feb 16 '15 at 15:06
3
sed 1d file.csv | while IFS=, read first second rest; do echo "$rest"; done
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
2

Here you go, a simple AWK program.

#!/usr/bin/awk -f

BEGIN {
    # set field separator to comma to split CSV fields
    FS = ","
}

# NR > 1 skips the first line
NR > 1 {
    # print only the desired fields
    printf("%s,%s,%s\n", $3, $4, $5)
}
steveha
  • 74,789
  • 21
  • 92
  • 117
2
perl -F, -lane 'if($.!=1){print join ",",@F[2,3,4];}' your_file

check here

Vijay
  • 65,327
  • 90
  • 227
  • 319
2

This might work for you (GNU sed):

sed -r '1d;s/([^,]*,){2}//' file
potong
  • 55,640
  • 6
  • 51
  • 83
2

try this

awk -F, 'NR > 1 { OFS=",";print $3, $4, $5 }' temp.txt

or this

sed -re '1d;s/^[0-9],\w+,//g' temp.txt
whoan
  • 8,143
  • 4
  • 39
  • 48
Mirage
  • 30,868
  • 62
  • 166
  • 261
  • Can you provide some explanation of what your fixes are doing. – Jon Egerton Jan 24 '13 at 12:26
  • @JonEgerton , in the awk i added the OFS and in sed i made more clearer so that the new users can see what i am matching . in previous answers regex are shhort but hard to comprehend for new user of regex. mine may not be perfect but at least are visible what they are doing. and they work – Mirage Jan 24 '13 at 13:02
1

I have created package for this kind of tasks - gumba If you feel comfortable with coffeescript you can give it a try

cat file.csv | tail -n +2 | \
gumba "words(',').take((words)-> words.last(3)).join(',')"`
welldan97
  • 3,071
  • 2
  • 24
  • 28
0
grep '^,' outlook.contacts.csv | sed 's/^,\([^,]*\),[^,]*,\([^,]*\),.*/\1 \2/'

Get all lines that starts with a , then using sed to replace blank fields with first and second name.

Be careful for some reason once you paste it changes the line to this so maybe you better to carefully do it manually.

grep '^,' outlook.contacts.csv | sed 's/^,([^,]),[^,],([^,]),./\1 \2/'
whoan
  • 8,143
  • 4
  • 39
  • 48