0

I wish to take names of two files as command line arguments in bash shell script and then for each word (words are comma separated and the file has more than one line) in the first file I need to count its occurrence in the second file. I wrote a shell script like this

 if [ $# -ne 2 ]
 then
 echo "invalid number of arguments"
 else
 i=1
 a=$1
 b=$2
 fp=*$b
 while[ fgetc ( fp ) -ne EOF   ]
 do
 d=$( cut -d',' -f$i $a )
 echo "$d"
 grep -c -o $d $b 
 i=$(( $i + 1 ))
 done
 fi

for example file1 has words abc,def,ghi,jkl (in first line ) mno,pqr (in second line)

and file2 has words abc,abc,def

Now the output should be like abc 2 def 1 ghi 0

Abhinav Arya
  • 225
  • 3
  • 9
  • 3
    The question & the title have little in common. This sounds like homework, which is fine, but the question needs to be a little more focused. – michael Oct 16 '14 at 06:49
  • 1
    It would also help if you showed some sample input and the expected output. – Tom Fenech Oct 16 '14 at 10:42

2 Answers2

2

To read a file word by word separated by comma use this snippet:

while read -r p; do
    IFS=, && for w in $p; do
       printf "%s: " "$w"
       tr , '\n' < file2 | grep -Fc "$w"
    done
done < file1
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • still not working....could you tell me how should I check for the end of file in a condition....I read somewhere that unix files don't have EOF – Abhinav Arya Oct 16 '14 at 08:04
  • Define **still not working** and provide details how did you test it and what's not working. This if fully tested and 100% working solution. There is no need to check for EOF or `fgetc` like in your C code. – anubhava Oct 16 '14 at 08:19
  • Let me give you a specific example. Say filea has words like abc,def,ghi in first line and in second line words are jkl,mno. Now how should I extract these fields and compare it with the words in second file. I want to compare words not characters – Abhinav Arya Oct 16 '14 at 10:29
  • 1
    You wrote **for each character in the first file I need to count its occurrence in the second file** and now you're saying **I want to compare words not characters** which one is correct? Better you edit the question with your actual requirement. – anubhava Oct 16 '14 at 10:34
  • There is a catch here. You want to count `occurrences of "abc"`, not `number of lines having "abc"`. If the word appears twice in a single line, this code would fail... – anishsane Oct 16 '14 at 11:00
1

Another approach:

words=( `tr ',' ' ' < file1`) #split the file1 into words...

for word in "${words[@]}"; do  #iterate in the words
    printf "%s : " "$word"
    awk 'END{print FNR-1}' RS="$word" file2
    # split file2 with 'word' as record separator.
    # print number of lines == number of occurrences of the word..
done
anishsane
  • 20,270
  • 5
  • 40
  • 73