-1

Okay, I have two files: one is baseline and the other is a generated report. I have to validate a specific string in both the files match, it is not just a single word see example below:

.
.
name os ksd 
56633223223
some text..................
some text..................

My search criteria here is to find unique number such as "56633223223" and retrieve above 1 line and below 3 lines, i can do that on both the basefile and the report, and then compare if they match. In whole i need shell script for this.

Since the strings above and below are unique but the line count varies, I had put it in a file called "actlist":

56633223223 1 5
56633223224 1 6
56633223225 1 3
.
.

Now from below "Rcount" I get how many iterations to be performed, and in each iteration i have to get ith row and see if the word count is 3, if it is then take those values into variable form and use something like this

I'm stuck at the below, which command to be used. I'm thinking of using AWK but if there is anything better please advise. Here's some pseudo-code showing what I'm trying to do:

xxxxx=/root/xxx/xxxxxxx
Rcount=`wc -l $xxxxx | awk -F " " '{print $1}'`

i=1
while ((i <= Rcount))
do
    record=_________________'(Awk command to retrieve ith(1st) record (of $xxxx),
    wcount=_________________'(Awk command to count the number of words in $record) 


    (( i=i+1 ))
done

Note: record, wcount values are later printed to a log file.

ShravanM
  • 323
  • 1
  • 7
  • 21
  • 1
    It would be useful if you also explained what these variables will be used for, as any advice on the best approach will probably depend on that. – Tom Fenech Sep 22 '14 at 10:53
  • 1
    How can the field have three words if you split by spaces? – newtover Sep 22 '14 at 10:54
  • i will use it to find the var in another file, i already have a command for it..i'm just trying to get the values into a variable here – ShravanM Sep 22 '14 at 10:55
  • e.g when i fetch first record "26565565 56565 56565" i see three here, correct me if i'm wrong – ShravanM Sep 22 '14 at 10:56
  • @TomFenech yes, they are same. my mistake. – ShravanM Sep 22 '14 at 10:57
  • Thanks for editing, you question in becoming clearer now. Can you show some more of the actual lines of your files? Where do the numbers `1 5`, `1 6` etc. come from? – Tom Fenech Sep 23 '14 at 07:15

4 Answers4

3

Sounds like you're looking for something like this:

#!/bin/bash

while read -r word1 word2 word3 junk; do
    if [[ -n "$word1" && -n "$word2" && -n "$word3" && -z "$junk" ]]; then
        echo "all good"
    else
        echo "error"
    fi
done < /root/shravan/actlist

This will go through each line of your input file, assigning the three columns to word1, word2 and word3. The -n tests that read hasn't assigned an empty value to each variable. The -z checks that there are only three columns, so $junk is empty.

Tom Fenech
  • 72,334
  • 12
  • 107
  • 141
  • well this is confusing as i'm not comparing it right now. i only need the values "26565565 56565 56565" in variable. Moreover i want the while do like i've mentioned, to be more specific while ((i <=Rcount)) do record=_____________(Awk command to retrieve ith(1st) record (of $acList) wcount_____________ (Awk command to count the number of words in $record) ___________________ (if the above condion matches, i need them "26565565 56565 56565" to be assigned to $word1, $word2, $word3.... From there on i can take it further – ShravanM Sep 22 '14 at 11:14
  • i've modified the question in body please look there – ShravanM Sep 22 '14 at 11:18
  • 1
    These seem like quite an arbitrary set of requirements. Does the code above not do what you want? Why do you want to call awk so many times? It is quite possible that you could do the whole thing in one invocation of awk (but that's impossible to say because you haven't shown all of your code). I'm happy to change my answer if you provide more details, explaining exactly what you're trying to do. – Tom Fenech Sep 22 '14 at 11:21
  • My requirement Read "actlist" file and get number of rows, and perform some action on each row. In "Rcount" i get the number or rows and now i use while do command for each iteration. In "record" variable i need complete record to be fetched i.e. 26565565 56565 56565, and "wcount" should have word count in it, now i perform a check if wcount=3 then word1=26565565 word2=56565 word3=56565, from there on i will use word1, word2, word3 in another file to perform a string search, for which i have code in place. And for why so many awk? i'm tring to assign each values so i can print in log file – ShravanM Sep 22 '14 at 11:58
1

I PROMISE you you are going about this all wrong. To find words in file1 and search for those words in file2 and file3 is just:

awk '
NR==FNR{ for (i=1;i<=NF;i++) words[$i]; next }
{ for (word in words) if ($0 ~ word) print FILENAME, word }
' file1 file2 file3

or similar (assuming a simple grep -f file1 file2 file3 isn't adequate). It DOES NOT involve shell loops to call awk to pull out strings to save in shell variables to pass to other shell commands, etc, etc.

So far all you're doing is asking us to help you implement part of what you think is the solution to your problem, but we're struggling to do that because what you're asking for doesn't make sense as part of any kind of reasonable solution to what it sounds like your problem is so it's hard to suggest anything sensible.

If you tells us what you are trying to do AS A WHOLE with sample input and expected output for your whole process then we can help you.

We don't seem to be getting anywhere so let's try a stab at the kind of solution I think you might want and then take it from there.

Look at these 2 files "old" and "new" side by side (line numbers added by the cat -n):

$ paste old new | cat -n
     1  a               b
     2  b               56633223223
     3  56633223223     c
     4  c               d
     5  d               h
     6  e               56633223225
     7  f               i
     8  g               Z
     9  h               k
    10  56633223225     l
    11  i
    12  j
    13  k
    14  l

Now lets take this "actlist":

$ cat actlist
56633223223 1 2
56633223225 1 3

and run this awk command on all 3 of the above files (yes, I know it could be briefer, more efficient, etc. but favoring simplicity and clarity for now):

$ cat tst.awk                    
ARGIND==1 {
    numPre[$1] = $2
    numSuc[$1] = $3
}

ARGIND==2 {
    oldLine[FNR] = $0
    if ($0 in numPre) {
        oldHitFnr[$0] = FNR
    }
}

ARGIND==3 {
    newLine[FNR] = $0
    if ($0 in numPre) {
        newHitFnr[$0] = FNR
    }
}

END {
    for (str in numPre) {
        if ( str in oldHitFnr ) {
           if ( str in newHitFnr ) {
               for (i=-numPre[str]; i<=numSuc[str]; i++) {
                   oldFnr = oldHitFnr[str] + i
                   newFnr = newHitFnr[str] + i
                   if (oldLine[oldFnr] != newLine[newFnr]) {
                       print str, "mismatch at old line", oldFnr, "new line", newFnr
                       print "\t" oldLine[oldFnr], "vs", newLine[newFnr]
                   }
               }
           }
           else {
               print str, "is present in old file but not new file"
           }
        }
        else if (str in newHitFnr) {
           print str, "is present in new file but not old file"
        }
    }
}

.

$ awk -f tst.awk actlist old new
56633223225 mismatch at old line 12 new line 8
        j vs Z

It's outputing that result because the 2nd line after 56633223225 is j in file "old" but Z in file "new" and the file "actlist" said the 2 files had to be common from one line before until 3 lines after that pattern.

Is that what you're trying to do? The above uses GNU awk for ARGIND but the workaround is trivial for other awks.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • I'm sorry for little information and the struggle i've put you through, i have modified my question, please let me if you have question. – ShravanM Sep 22 '14 at 14:10
  • I'm sorry it's still unclear what you're trying to do but it definitely IS a job for one small, simple awk command. If you can just tidy up your question to include some actual, testable sample input and expected output and just focus on describing WHAT you want to do instead of HOW you think you should do it, I'm sure you'll get a great solution. – Ed Morton Sep 22 '14 at 16:05
  • I just editted my question to take a stab at what I think you're trying to do as a whole. – Ed Morton Sep 22 '14 at 20:50
0

Use the below code:

awk '{if (NF == 3) { word1=$1; word2=$2; word3=$3; print "Words are:" word1, word2, word3} else {print "Line", NR, "is having", NF, "Words" }}' filename.txt
Infinite Recursion
  • 6,511
  • 28
  • 39
  • 51
Nikhil Gupta
  • 271
  • 2
  • 4
0

I have given the solution as per the requirement.

awk '{                                          # awk starts from here and read a file line by line
if (NF == 3)                                    # It will check if current line is having 3 fields. NF represents number of fields in current line
{ word1=$1;                                     # If current line is having exact 3 fields then 1st field will be assigned to word1 variable
word2=$2;                                       # 2nd field will be assigned to word2 variable
word3=$3;                                   # 3rd field will be assigned to word3 variable
print word1, word2, word3}                      # It will print all 3 fields
}' filename.txt >> output.txt                   # THese 3 fields will be redirected to a file which can be used for further processing.

This is as per the requirement, but there are many other ways of doing this but it was asked using awk.

Nikhil Gupta
  • 271
  • 2
  • 4