0

Okay, I got the file.bib file with multiple entries such

@Book{Anley:2007:shellcoders-handbook-2nd-ed,
  author =   {Chris Anley and John Heasman and Felix Lindner and Gerardo
    Richarte},
  title =    "{The Shellcoder's Handbook}",
  publisher =    {Wiley},
  year =     2007,
  edition =      2,
  month =    aug,
}

there you can find the "year = 2007" line. My task is to filter out the years which are greater than 2020 ($currentyear) or lower than 1900 ($minyear), the result should be a also the output of the month "may", which stands behind a "year" line in this file. (Which is a mistake by the admin). (btw the file is over 4000 lines long).

kvantour
  • 25,269
  • 4
  • 47
  • 72
Eric
  • 1
  • 2
  • Show sample lines from the file. Do it in `awk`. – KamilCuk Jan 21 '20 at 16:01
  • It's not clear what you are asking. Can you share sample input lines (from file.bib), so it will be possible to understand the input ? Also, can you share the expected output in such a way that it will be clear what lines should be included. – dash-o Jan 21 '20 at 16:01
  • Can you tell us what you really want to achieve? Do you want to extract information from `file.bib` based on the year? If so, what do you want to extract, and can you show us an example. – kvantour Jan 21 '20 at 16:05
  • 1
    Following your pipe, add `awk "\$0 >= $min_year && \$0 <= $current_year"` should be fine. – dibery Jan 21 '20 at 16:11
  • I believe you are asking an [XY-problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem/66378#66378) as became clear from my answer. Please see [here](https://meta.stackoverflow.com/questions/308444/) how to proceed. – kvantour Jan 22 '20 at 10:55
  • Question: do you need to filter out the full bibtex entries based on that year-condition, or just the lines with the years? – kvantour Jan 22 '20 at 16:42
  • @kvantour Actually I just have to filter out every line with the "year", and check if the entry behind it is greater than 2020 or lower than 1900, and if it accidentally says e.g. a month and print all the lines (the result should be two lines, one with the year = may, and one with the year = 2024. But the "year" lines only exist in every entry that starts with "@". – Eric Jan 22 '20 at 16:47

1 Answers1

1

It is better to use awk for this. Similar to your line, it would read:

awk -v t1="1900" -v t2="$(date "+%Y")" \
    '!match($0,/year.*=.*/){next}
     {t=substr(RSTART,RLENGTH)
      match(t,/[0-9][0-9][0-9][0-9]/)
      y=substr(RSTART,RLENGTH)
     }
     (y > t1) && (y <= t2) { print y }' file
kvantour
  • 25,269
  • 4
  • 47
  • 72
  • Hey, thanks for your help. But with your code I get some errors. ` awk: Command line:3: match(t,/[0-9][0-9][0-9][0-9]/} awk: Command line:3: ^ syntax error awk: Command line:5: y=substr(RSTART,RLENGTH) awk: Command line:5: ^ Unexpected line break or end of the character string awk: Command line:6: (y > t1) && (y <= t2) { print y } awk: Command line:6: ^ syntax error ` – Eric Jan 21 '20 at 16:21
  • Yea but the error is still there, also with the double quote (which I commented – Eric Jan 21 '20 at 16:24
  • oh yea thanks. But my output is zero, here you can find an example of my file: `@Book{Beutelspacher:2009:Kryptologie-9ed, author = {Albrecht Beutelspacher}, title = "{Kryptologie}", publisher = {Vieweg+Teubner}, year = 2009, edition = 9 }` and there you can find the year = 2009.. and I should check that for every entry if its greather than 2020 or lower than 1900 (To express the question more clearly) – Eric Jan 21 '20 at 16:43
  • @Eric I believe you have an XY-problem. You say you want to do X, but you really try to do Y. I suggest you update your question and clearly state that you have a bibtex file from which you want to extract all entries with a year between start year and end year. This will be much more useful. I'm also convinced there will be very easy solutions to that answer. – kvantour Jan 22 '20 at 10:50
  • I updated the question. Hopefully it is now more understandable :) – Eric Jan 22 '20 at 16:31