1

I've been using the following Unix bash script:

#!/bin/bash
mkdir -p ~/Desktop/URLs
n=1
while read mp3; do
  curl "$mp3" > ~/Desktop/URLs/$n.mp3
  ((n++))
done < ~/Desktop/URLs.txt

to download and rename a bunch mp3 files from URLs listed in "URLs.txt". It works well (thanks to StackOverflow users), but due to a suspected server quantity/time download limit, it's only allowing me to access a range of 40 - 50 files from my URL list.

Is there a way to work around this by adding a "timer" inside the while loop so it downloads 1 file per "X" seconds?

I found another related question, here:

How to include a timer in Bash Scripting?

but I'm not sure where to add the "sleep [number of seconds]"... or even if "sleep" is really what I need for my script...?

Any help enormously appreciated — as always.

Dave

Community
  • 1
  • 1

4 Answers4

1

curl has some pretty awesome command-line options (documentation), for example, --limit-rate will limit the amount of bandwidth that curl uses, which might completely solve your problem.

For example, replace the curl line with:

curl --limit-rate 200K "$mp3" > ~/Desktop/URLs/$n.mp3

would limit the transfers to an average of 200K per second, which would download a typical 5MB MP3 file in 25 seconds, and you could experiment with different values until you found the maximum speed that worked.

You could also try a combination of --retry and --retry-delay so that when and if a download fails, curl waits and then tries again after a certain amount of time.

For example, replace the curl line with:

curl --retry 30 "$mp3" > ~/Desktop/URLs/$n.mp3

This will transfer the file. If the transfer fails, it will wait a second and try again. If it fails again, it will wait two seconds. If it fails again, it will wait four seconds, and so on, doubling the waiting time until it succeeds. The "30" means it will retry up to 30 times, and it will never wait more than 10 minutes. You can learn this all at the documentation link I gave you.

Joel Spolsky
  • 33,372
  • 17
  • 89
  • 105
  • Hey, Joel. Thanks for your reply. Could you give me an example of what you mentioned above as I'm this is all very new to me. –  Dec 10 '10 at 04:42
  • Cheers! That's nice and simple. I'll give it a shot. –  Dec 10 '10 at 05:08
  • No luck with those curl ideas, unfortunately. –  Dec 10 '10 at 05:34
0
#!/bin/bash
mkdir -p ~/Desktop/URLs
n=1
while read mp3; do
  curl "$mp3" > ~/Desktop/URLs/$n.mp3 &
  ((n++))
  if ! ((n % 4)); then
     wait
     sleep 5
  fi
done < ~/Desktop/URLs.txt

This will spawn at most 4 instances of curl and then wait for them to complete before it spawns 4 more.

SiegeX
  • 135,741
  • 24
  • 144
  • 154
  • Thanks for your reply. The script runs, but only goes as far as 0.mp3 1.mp3 2.mp3 3.mp3 4.mp3 –  Dec 10 '10 at 04:39
  • ahh, right. You use $n in the file name. Fixed the script to accommodate for this – SiegeX Dec 10 '10 at 04:52
  • That fixed it. This script has got me from up to 80.mp3 — do you know any way to push it further so I can get the rest of the URLs in txt file? Thanks a lot for your help! –  Dec 10 '10 at 05:07
  • you could put in a `sleep 5` right below the `wait` to delay the next batch by 5 seconds. – SiegeX Dec 10 '10 at 05:08
  • No luck I'm afraid. That only took me to 47.mp3. Weird. –  Dec 10 '10 at 05:30
  • more slower? sleep 20 instead of 5? – Joel Spolsky Dec 10 '10 at 05:36
  • that time was 43.mp3. It seems to just jump between the two. Need to get past that barrier... somehow. Thanks for your help. –  Dec 10 '10 at 05:54
  • Unfortunately we are shooting blind unless we know what exactly it is that server is keying on. – SiegeX Dec 10 '10 at 06:18
0

Not so much an answer as an optimization. If you can consistently get the first few URLs, but it times out on the later ones, perhaps you could trim your URL file as the mp3s were successfully received?

That is, as 1.mp3 is successfully downloaded, strip it from the list:

tail url.txt -n +2 > url2.txt; mv -f url2.txt url.txt

Then the next time the script runs, it'll begin from 2.mp3

If that works, you might just set up a cron job to periodically execute the script over and over, taking bites at a time.

It just occurred to me that you're programatically numbering the mp3s, and curl might clobber some of them on restart, since every time it runs it'll start counting at 1.mp3 again. Something to be aware of, if you go this route.

Dharman
  • 30,962
  • 25
  • 85
  • 135
Steve V.
  • 463
  • 4
  • 14
  • Be careful! The shell redirection will kill (truncate) the file before tail can read it, and you'll end up with a zero-length file, trashing the whole list! – Michael Trausch Dec 10 '10 at 06:04
  • Ahh, good. Was hoping you'd correct it! It's something you learn the hard way... I did, years ago... – Michael Trausch Dec 10 '10 at 06:14
  • Thanks for the replys guys. I didn't get a chance to try that — so probably a good thing, right=) –  Dec 10 '10 at 06:35
0

A timer?

Like your crontab?

man cron

You know what they let you download, just count the disk usage of your files that you did get.

There is the transfer you are allowed. You need that, and you will need the PID of your script.

ps aux | grep $progname | print awk '{print $1}'

or something like that. The secret sauce here is that you can suspend with

kill -SIGSTOP  PID

and resume with

kill -SIGCONT  PID

So the general method would be

  • Urls on an array or queue or whatever bash lets you have

  • Process an url.

  • increment transfer counter

  • When transfer counter gets close

  • kill -SIGSTOP MYPID

  • You are suspended.

  • in your crontab foreground your script after a minute/hour/day whatever

  • Continue processing

  • Repeat until done.

just don't log out or you'll need to do the whole thing over again, although if you used perl it would be trivial.

Disclaimer, I am not sure if this is an exercise in bash or whatnot, I confess freely that I see the answer in perl, which is always my choice outside of a REPL. Code in Bash long enough , or heaven forbid, Zsh ( my shell ) and you will see why Perl was so popular. Ah memories...

Disclaimer 2: Untested, drive by , garbage methodology here only made possible because you've an idea what that transfer might be. Obviously, if you have ssh , use ssh -D PORT you@host and pull the mp3's out of the proxy half the time.

In my own defense, if you slow pull the urls with sleep you'll be connected for a while. Perhaps "they" might notice that. Suspend and resume and you only should be connected while grabbing tracks, and gone otherwise.

chiggsy
  • 8,153
  • 5
  • 33
  • 43
  • Cheers for your time, chiggsy! I'll look into cron and see how I go. –  Dec 10 '10 at 08:07