Multi-Threaded Downloading with Wget

Jan 24th, 2011 | By | Category: Random Musings

In the process of downloading a few thousand log files from one server to the next I suddenly had the need to do some serious multithreaded downloading in BSD, preferably with Wget as that was the simplest way I could think of handling this.  A little looking around led me to this little nugget:

wget -r -np -N [url] & wget -r -np -N [url] & wget -r -np -N [url] & wget -r -np -N [url]

Just repeat the wget -r -np -N [url] for as many threads as you need… Now given this isn’t pretty and there are surely better ways to do this but if you want something quick and dirty it should do the trick… Enjoy!

 

 

An even better solution submitted by a reader much smarter than I went somethign like this:

THREADS=10
WGETFLAGS=’-r -np -N’
URL=”http://example.org/”
for i in {1..$THREADS}; do
wget $WGETFLAGS “$URL”
done

Tags: , , , , ,

7 Comments to “Multi-Threaded Downloading with Wget”

  1. Marc says:

    I am downloading some large files, and watching the logs, it seems like multiple threads attempt the same file. Is that expected here? How do you get some sort of locking?

  2. admin says:

    You’re going to need some more sophisticated scripted to do that.. If you have a .txt file containing the urls you are going to be pulling you could have wget build yuo a list of all of the recursive files and add it to a .txt file.. Then you have multiple wget instances go through this file and download it one line at a time, deleting the lines immediately after it copies the contents of that line to the thread that is grabbing it. That’s what i can think of off the top of my head but I’m sure there’s a better way.

  3. vchakoshy says:

    you saved my time, thank you

  4. Pavel says:

    Hi,

    to Marc – maybe “-m” can help you if you use ftp downloading. This create .listing files for better downloading.
    Pavel

  5. Perry says:

    Awesome nugget, heres one better ;)

    THREADS=10
    WGETFLAGS=’-r -np -N’
    URL=”http://example.org/”
    for i in {1..$THREADS}; do
    wget $WGETFLAGS “$URL”
    done

  6. admin says:

    Beautiful Perry, thanks for the suggestion, going to add it to the post

  7. AshkanV says:

    But I think the for-loop version cannot work correctly, because each wget command starts after the previouse one is finished.
    So maybe in order to make it correct the for-loop must create the original command line and concatenate it using “&”, and after the loop ends, runs it.
    And also defining a range in for-loop cannot accept a variable for its beginning/ending parts

    So the example should looks something like this:

    ——-
    THREADS=10
    WGETFLAGS=’-r -np -N’
    URL=”http://example.org/”
    for i in `seq 1 $((THREADS-1))`
    do
    Command=”$Command””wget $WGETFLAGS $URL”‘ & ‘
    done
    Command=”$Command””wget $WGETFLAGS $URL”
    eval $Command
    ——-

Leave a Comment