1

I have this function in R, which I use to produce a list of dates:

#! usr/bin/env Rscript

date_seq = function(){
        args = commandArgs(trailingOnly = TRUE)
    library(lubridate)
    days = seq(ymd(args[1]),ymd(args[2]),1)
    days =format(days, "%Y%m%d")
    return(days)

}   
date_seq()

I call this function in a bash script to create a vector of dates:

Rscript date_seq.R 20160730 20160801 > dates

I define a couple of other string variables in the bash script:

home_url="https://pando-rgw01.chpc.utah.edu/hrrr/sfc/"
file_name="/hrrr.t{00-23}z.wrfsfcf00.grib2"

The final goal is to create a vector of download links, that incorporates the three variables home_url, date and file_name, like so:

"https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20160730/hrrr.t{00-23}z.wrfsfcf00.grib2"
"https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20160731/hrrr.t{00-23}z.wrfsfcf00.grib2"
"https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20160801/hrrr.t{00-23}z.wrfsfcf00.grib2"

I tried a few lines in bash script:

  • for date in $dates; do download_url=$home_url$date$hrrr_file; cat $download_url; done
  • for date in $dates; do download_url="${home_url}${date}${hrrr_file}"; cat $download_url; done
  • for date in $dates; do download_url="$home_url"; download_url+="$date"; download_url+="$hrrr_file"; cat $download_url; done

None of these produce the output I expect. I am not sure if the download_url variable is not being produced, or is being produced and stored somewhere, and I am not able to reproduce it. Can anyone please help me understand?

Edit
Results of trying the suggestions below:

  • @triplee suggested using
    sed "s#.*#$home_url&$hrrr_file#" "dates"
    
    and
    while read -r date; do; printf '%s%s%s\n' "$home_url" "$date" "$hrrr_file"; done <dates
    
    Both of these produce this output:
    https://pando-rgw01.chpc.utah.edu/hrrr/sfc/[1] "20160730" "20160731" "20160801"/hrrr.t{00-23}z.wrfsfcf00.grib2
    
  • @xdhmoore suggested using
    for date in $(cat dates); do; echo ${home_url}${date}${hrrr_file}"; done
    
    which produces this output:
    https://pando-rgw01.chpc.utah.edu/hrrr/sfc/[1]/hrrr.t{00-23}z.wrfsfcf00.grib2
    https://pando-rgw01.chpc.utah.edu/hrrr/sfc/"20160730"/hrrr.t{00-23}z.wrfsfcf00.grib2
    https://pando-rgw01.chpc.utah.edu/hrrr/sfc/"20160731"/hrrr.t{00-23}z.wrfsfcf00.grib2
    https://pando-rgw01.chpc.utah.edu/hrrr/sfc/"20160801"/hrrr.t{00-23}z.wrfsfcf00.grib2`
    

Both are not the output I am expecting, though the solution by @xdhmoore is closer. But I see another problem in @xdhmoore's solution: The quotations around the date in output. The output of cat dates looks like this: "20160730" "20160731" "20160801", so I think I have to rework the function or the way I call it in the bash script as well.

I'll keep updating the question to reflect the output of all suggestions, since it is simpler to do so than trying to answer each comment. As always, thanks a lot!

tripleee
  • 175,061
  • 34
  • 275
  • 318
KVemuri
  • 194
  • 1
  • 16
  • I think you want to `echo $download_url` instead of `cat` it. Or possibly `curl` it. – xdhmoore Jan 02 '20 at 18:17
  • @xdhmoore Eventually, yes, the idea is to use `curl` to download the files represented by those links, but before I get there, I wanted to make sure that the script was producing the correct download links, which is why I wanted to view the links. Also, both `echo $download_url` and `echo "$download_url"` do nothing in terms of displaying the `download_url` variable on the screen. – KVemuri Jan 02 '20 at 18:20
  • I think you also will need to cat the dates file: `for date in $(cat dates)` – xdhmoore Jan 02 '20 at 18:22
  • @xhhmoore https://mywiki.wooledge.org/DontReadLinesWithFor – tripleee Jan 02 '20 at 18:23
  • @tripleee good to know. – xdhmoore Jan 02 '20 at 18:36
  • 1
    I don't know anything about Rscript, but it looks like the dates file contains the default serialized version of your `date_seq()`'s returned object. Instead of returning that and having it serialized and printed out, maybe there's a way to explicitly print the values out within `date_seq()`? That should get rid of the quotes and the `[1]` hopefully. – xdhmoore Jan 02 '20 at 19:55
  • 1
    Actually, I think the `[1]` might come from a bug within `date_seq()` but like I said, I don't know any Rscript so I'm not sure. – xdhmoore Jan 02 '20 at 20:01

1 Answers1

2

The for statement loops over the tokens you give it as arguments, not the contents of files.

You seem to be looking for

sed "s#.*#$home_url&$hrrr_file#" "dates"

The token & recalls the text which was matched by the regex in a sed substitution.

The same thing could be done vastly more slowly with a shell loop;

while read -r date; do
    printf '%s%s%s\n' "$home_url" "$date" "$hrrr_file"
done <dates

which illustrates how to (slowly) iterate over the lines in a file without the use of external utilities.

Either of hese can be piped to xargs curl (or perhaps xargs -n 1 curl); or you could refactor the while loop;

while read -r date; do
    curl "$home_url$date$hrrr_file"
done <dates

As noted in comments, cat is a command for copying files, not echoing text; for the latter, use echo or (for any nontrivial formatting) printf.

Update: The above assumes your R output generated one date per line. To split the file into lines and remove quotes around the values, you can preprocess with sed 's/"\([^"]\)" */\1\n/g' "dates" (provided your sed dialects supports \n as an escape for newline); or perhaps do

sed "s#\"\([^\"]*\)\" *#$home_url\\1$frrr_file\\
#g" "dates"

again with some reservation for differences between sed dialects. In the worst case, maybe switch to Perl, which actually brings some relief to the backslashitis, but requires new backslashes in other places:

perl -pe "s#\"(\d+)\" *#$home_url\$1$frrr_file\n#g" "dates"

But probably a better solution is to change your R script so it doesn't produce wacky output. Or just don't use R in the first place. See e.g. https://stackoverflow.com/a/3494814/874188 for how to get dates from Perl. Or if you have GNU date, try

#!/bin/bash
start=$(date -d "$1" +%s)
end=$(date -d "$2" +%s)
for ((i=start; i<=end; i+=60*60*24)); do
    date -d "@$i" +%Y%m%d
done

(If you are on a Mac or similar, the date program won't accept a date as an argument to -d and you will have to use slightly different syntax. It's not hard to do but this answer has too many speculations already.)

tripleee
  • 175,061
  • 34
  • 275
  • 318