0

I have a txt file which contains all the url of the images. Each url is on a new line.

I want to download all the urls. I searched the web and found the wget command with -i option useful but i am not able to make out the difference whether for every link it is opening a connection and then downloading or just one connection is opened and all the files are downloaded.

The gist of what i want to ask is that i need a tool/program/anything which can download all these images superfast.

The txt file has millions of image link so when i tried uget tool it was comparitivly slower and also it couldnt load all the images. So suggest some method for downloading at lightning fast speed ?

iec2011007
  • 1,828
  • 3
  • 24
  • 38

2 Answers2

1

What you need is parallelism. If a single thread can't download the files fast enough, multiple threads are needed. Although it may be the case that the limiting factor is your Internet connection bandwidth, in which case nothing will help.

Have you thought of manually splitting the file into say ten or hundred pieces, and then using ten or hundred uget processes to download URLs from each file? This would be an easy hack to add parallelism to the download process.

Of course, you can use e.g. Python or Java to develop a software that starts the multiple threads for you and processes URLs, but then you need to be familiar with thread programming and in either case, it's probably simpler to just split the file into multiple and start multiple uget processes, as developing the software takes much time and you may not save the time later by using the software.

Is the server controlled by you? One or multiple servers? If all images are on one server and it's not controlled by you, I would consider not placing too much load on the server.

I have had the same kind of problem previously, and in that case I used Java code to download the images, and only one thread. Furthermore, I placed intentional sleep calls between downloading the images in order not to load the server too mcuh. So, I didn't want performance; I wanted to not put too much load on the server. In that case, there was only one server and it wasn't controlled by me.

juhist
  • 4,210
  • 16
  • 33
  • Excellet anwer, but i just dont have one file i have many files with millions of links so it is not feasible to break, secondly it doesnt answer the question that are every request processed separetly or just by a single connection(say, using one thread) – iec2011007 Mar 23 '15 at 08:53
  • In what format are the files? One line per URL? If you have e.g. 5 files and want to download in 3 threads, you could try concatenating these together: cat file1.txt file2.txt file3.txt file4.txt file5.txt > fileall.txt and then splitting fileall.txt back into filenew1.txt, filenew2.txt and filenew3.txt and then using uget with filenew1.txt, filenew2.txt and filenew3.txt – juhist Mar 23 '15 at 09:01
  • I don't know if uget downloads files using one connection per process or one connection per request, but if you want performance you anyway need multiple connections due to using multiple threads of execution. Anyway, if you want to download the files quickly, better to select a solution that is quick to come up with, not endlessly analyze the properties of the solution before choosing it. – juhist Mar 23 '15 at 09:02
-1

You could also do a for loop. If your file where the urls are stored is called urlfile.txt you could execute

# for in i `cat urlfile.txt` ; do ; wget -i $i ; done
  • i have done similar to this but the question is everytime the wget out of scope new connection will be created and previous connection would be terminated which would be slow ? – iec2011007 Mar 23 '15 at 08:55