-4

I have an input file like this

John completed his graduation 
John is working for an IT industry
Thomas completed his graduation
John completed his graduation
Thomas is working for an IT industry
Thomas is working for an IT industry

I want an output like this

John word has 2 Graduations
Thomas word has 2 IT industry

Can any body help me out

Cyrus
  • 84,225
  • 14
  • 89
  • 153
  • 1
    Specify a language you want to do this in within the question contents, what you've attempted to do to get there, and then ask for help. Most importantly though, what have you done to answer your own question? – zealoushacker Sep 01 '15 at 05:12
  • 2
    and the problem for your question is ? A bit poor in 1) explaination 2) solution trying where you have the issue and where we can help. – NeronLeVelu Sep 01 '15 at 05:13
  • Atleast you would have tried to post your code here. Now on please try and post you piece of code. – Amareesh Sep 01 '15 at 05:54

2 Answers2

0

Solution in Perl

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my %name_degree;
my %name_industry;
while(<DATA>){
    chomp;
    if(/^([A-Za-z]+).*?(graduation)/){
        $name_degree{$1}++;
    }
    if(/^([A-Za-z]+).*?(IT industry)/){
        $name_industry{$1}++;
    }
}
foreach (keys %name_degree){
    print "$_ word has $name_degree{$_} Graduations\n"; 
}
foreach (keys %name_industry){
    print "$_ word has $name_industry{$_} IT industry\n";   
}
__DATA__
John completed his graduation 
John is working for an IT industry
Thomas completed his graduation
John completed his graduation
Thomas is working for an IT industry
Thomas is working for an IT industry

Demo

Note: Regex could be improved based on knowledge of data in file.

Chankey Pathak
  • 21,187
  • 12
  • 85
  • 133
0

Perhaps you can do something like this and work on replacing the words with your choice

sort file | uniq -c | sort -k2,2 -k1,1r | awk '!a[$2]++{print $2, "word has", $1, $NF}'

John word has 2 graduation
Thomas word has 2 industry

sort the file, find counts of each, take the highest count for each line and print.

karakfa
  • 66,216
  • 7
  • 41
  • 56