-1

I've been fighting with awk to make this work but I've been unable to do it. I have two lines as follows:

= filename: /path/to/file
years="1990,2001"

I need to check each year between the quotes against a given value and then print the previous line if it matches and get the filename as a result (it only needs to match the first one found). The value and operator i.e. <,>,=,<=,=>,~ will be passed via a variable to awk like:

value="2000"
string"=\$2 < $value"      # just an example

awk ... '"$string"' ...

There are conditional statements which can generate this string based on the input received.

I've tried separating each field using a space, quote and comma as a delim:

awk -F'[," ]' '{i=(1+(i%N));if(buf[i]&& $2<2000) print buf[i]; buf[i]=$0;}'

This works but I need to loop through all the columns += $2. I tried to do something like this:

awk -F'[," ]' '{for(f=2;f<=NF;f++);i=(1+(i%N));if(buf[i]&& $f>1950) print buf[i]; buf[i]=$0;}'

But that didn't work (I'm probably just doing it wrong).

I also considered getting rid of if(buf[i]&& $2>1950) print buf[i]; buf[i]=$0; and just joining the two lines and separating the fields, checking with the loop on += $5 and then printing just "$3" since that will always be the filename, but I can't figure out how to merge the two lines into one.

Example:

year < 2000

Input text:

= filename: /mnt/project1/record1.txt
years="2005,2019,2011,2012,2013"

= filename: /mnt/project1/record2.txt
years="1996,2000"

= filename: /mnt/project1/record3.txt
years="2005,2001,2012"

= filename: /mnt/project1/record4.txt
years="2010,2009,1997,2000"

Output (match):

/mnt/project1/record2.txt
/mnt/project1/record4.txt
bcHelix
  • 13
  • 2
  • Please add sample input (no descriptions, no images, no links) and your desired output for that sample input to your question (no comment). – Cyrus Dec 31 '20 at 17:40
  • 1
    your example shows `year < 2000`; will you always be looking for a date that is **less than** (`<`) a desired target date (eg, `2000` in your example)? could you at some point also need to look at other comparisons ... `>`, `<=`, `>=`, `date1 <= year <= date2`, etc? – markp-fuso Dec 31 '20 at 18:50
  • No, it would need to be any operator or ~ (for strings if need be). The '$2 [operator] [value]' will be passed as a string to AWK: i.e. "var="\$2 < $search_value", and then "awk ... '"$var"' ... – bcHelix Dec 31 '20 at 19:45
  • @bcHelix that's a very different requirement from what's currently shown in your question (as you can see from the answers you've received so far). Please [edit] your question to provide a much clearer statement of your requirements and a more truly realistic example. – Ed Morton Dec 31 '20 at 19:52
  • your sample code shows a test for a year `>1950`, but the sample input/output is based on a test for a year `< 2000`; should probably look at updating the question to provide a more concise/clear explanation of what you're doing; it wouldn't hurt to provide a couple different examples (eg, `>1950`, `<2000`, and whatever demonstraites `~ (for strings if need be)`) of input/output – markp-fuso Dec 31 '20 at 19:53
  • This may be what you're looking for: https://stackoverflow.com/a/54161251/1745001. – Ed Morton Dec 31 '20 at 19:55
  • What do you mean by `$2` in your example `string"=\$2 < $value"`? If you're hoping to use that string as-is in an awk script (as it sounds like in your question), there is no `$2` that it could directly apply to in your data. – Ed Morton Dec 31 '20 at 21:36
  • That's a literal '$2' is the variable so awk would read it as the second column. – bcHelix Dec 31 '20 at 23:02
  • But there is no `$2` in your data (e.g. `years="2010,2009,1997,2000"`) that could be compared to the value. – Ed Morton Jan 01 '21 at 14:18

4 Answers4

1

I'm still learning AWK but IIUC you want something similar to the following script.awk:

#!/usr/bin/awk -f

BEGIN {
    FS = "= filename:"
}

{
    if ( NF == 2 ) {
        filename = $2
        getline
    }
}

/^years="1990,2001"$/ \
{
    print "filename: " filename
}

Run with the following input:

= filename: /path/to/file5
years="1990,2001"

= filename: /path/to/file4
years="1990,2002"

= filename: /path/to/file3
years="1990,2003"

= filename: /path/to/file2
years="1990,2004"

= filename: /path/to/file1
years="1990,2005"

= filename: /path/to/file0
years="1990,2006"

like that:

$ ./script.awk input
filename:  /path/to/file5
Arkadiusz Drabczyk
  • 11,227
  • 2
  • 25
  • 38
0

Trying to make few assumptions beyond what you showed....
Here's one example of how you might parse it.

$: cat tst
= filename: /path/to/file/none
years="1890,1001"

= filename: /path/to/file/first
years="1990,0001"

= filename: /path/to/file/none
years="1890,1001"

= filename: /path/to/file/both
years="1990,2001"

= filename: /path/to/file/none
years="1890,1001"

= filename: /path/to/file/second
years="1790,2001"

= filename: /path/to/file/none
years="1890,1001"


$: awk '/filename:/ { fn=$3 } /years=/{ split($0,a,"[^0-9]+");
          for (y in a) { if (a[y]>1950) { printf "%s(%s)\n",fn,a[y]; break; } } }' tst
/path/to/file/first(1990)
/path/to/file/both(1990)
/path/to/file/second(2001)
Paul Hodges
  • 13,382
  • 1
  • 17
  • 36
0

In case you need variables for the years you can do something like this

awk -v y0=1990 -v y1=2001 '
$0 ~ "^years=\"" y0 "," y1 "\"$" {
    printf("filename: %s\n", filename);
}

/^= filename/ {
    filename = $3
}
' file.txt
Diego Torres Milano
  • 65,697
  • 9
  • 111
  • 134
0

Assumptions:

  • OP will always be searching for dates less than (<) a given target date (eg, 2000 in OP's example)
  • a line starting with ^= filename: is always followed by a line starting with ^years= (otherwise we would need more sample input data before adding more logic)

Sample data:

$ cat file_year.dat
= filename: /mnt/project1/record1.txt
years="2005,2019,2011,2012,2013"             # no match

= filename: /mnt/project1/record2.txt
years="1996,2000"                            # match on 1996 < 2000

= filename: /mnt/project1/record3.txt
years="2005,2001,2012"                       # no match

= filename: /mnt/project1/record4.txt
years="2010,2009,1997,2000"                  # match on 1997 < 2000

= filename: /mnt/project1/record5.txt
years="2010,2009,2007,1947"                  # match on 1947 < 2000

NOTE: comments added to show what should (not) match; comments do not exist in the actual data file

One awk idea:

awk -v tgt=2000 '                            # pass target date in as awk variable "tgt"
/^= filename: / { fn=$(NF)                   # save the filename in awk variable "fn"
                  next                       # skip to the next line of input
                }
fn != ""        { n=split($0,arr,"[,\"]")    # if "fn" is set then split the current line on double quotes
                                             # and commas; store results (ie, individual years)
                                             # in array "arr[]"
                  for (i=2; i<n; i++)        # process arr[2-(n-1)] elements (individual years)
                      if ( arr[i] < tgt )    # if less than "tgt" ...
                         { print fn          # print associated filename ("fn") and ...
                           break             # break out of loop
                         }
                   fn=""                     # clear "fn" varible
                }
' file_year.dat

This generates:

/mnt/project1/record2.txt
/mnt/project1/record4.txt
/mnt/project1/record5.txt
markp-fuso
  • 28,790
  • 4
  • 16
  • 36