Multi-Threaded Downloading with Wget

Jan 24th, 2011 | By | Category: Random Musings

In the process of downloading a few thousand log files from one server to the next I suddenly had the need to do some serious multithreaded downloading in BSD, preferably with Wget as that was the simplest way I could think of handling this.  A little looking around led me to this little nugget:

wget -r -np -N [url] & wget -r -np -N [url] & wget -r -np -N [url] & wget -r -np -N [url]

Just repeat the wget -r -np -N [url] for as many threads as you need… Now given this isn’t pretty and there are surely better ways to do this but if you want something quick and dirty it should do the trick… Enjoy!

 

 

An even better solution submitted by a reader much smarter than I went somethign like this:

THREADS=10
WGETFLAGS=’-r -np -N’
URL=”http://example.org/”
for i in {1..$THREADS}; do
wget $WGETFLAGS “$URL”
done

Tags: , , , , ,

9 Comments to “Multi-Threaded Downloading with Wget”

  1. Marc says:

    I am downloading some large files, and watching the logs, it seems like multiple threads attempt the same file. Is that expected here? How do you get some sort of locking?

  2. admin says:

    You’re going to need some more sophisticated scripted to do that.. If you have a .txt file containing the urls you are going to be pulling you could have wget build yuo a list of all of the recursive files and add it to a .txt file.. Then you have multiple wget instances go through this file and download it one line at a time, deleting the lines immediately after it copies the contents of that line to the thread that is grabbing it. That’s what i can think of off the top of my head but I’m sure there’s a better way.

  3. vchakoshy says:

    you saved my time, thank you

  4. Pavel says:

    Hi,

    to Marc – maybe “-m” can help you if you use ftp downloading. This create .listing files for better downloading.
    Pavel

  5. Perry says:

    Awesome nugget, heres one better 😉

    THREADS=10
    WGETFLAGS=’-r -np -N’
    URL=”http://example.org/”
    for i in {1..$THREADS}; do
    wget $WGETFLAGS “$URL”
    done

  6. admin says:

    Beautiful Perry, thanks for the suggestion, going to add it to the post

  7. AshkanV says:

    But I think the for-loop version cannot work correctly, because each wget command starts after the previouse one is finished.
    So maybe in order to make it correct the for-loop must create the original command line and concatenate it using “&”, and after the loop ends, runs it.
    And also defining a range in for-loop cannot accept a variable for its beginning/ending parts

    So the example should looks something like this:

    ——-
    THREADS=10
    WGETFLAGS=’-r -np -N’
    URL=”http://example.org/”
    for i in `seq 1 $((THREADS-1))`
    do
    Command=”$Command””wget $WGETFLAGS $URL”‘ & ‘
    done
    Command=”$Command””wget $WGETFLAGS $URL”
    eval $Command
    ——-

  8. Pratyush Singh says:

    Indeed wget is awesome but its not made for what you are trying to achieve. A better option in this situation would be aria2 (aria2.github.io).

  9. Sandokhan says:

    To build completely script based on wget command line tool which in fact will transform your usual wget into full multithreads, reasuming, masking download manager, you need not only concentrate on:

    1. MULTITHREADS downloading like ON website http://blog.netflowdevelopments.com/2011/01/24/multi-threaded-downloading-with-wget or:
    http://stackoverflow.com/questions/3430810/wget-download-with-multiple-simultaneous-connections
    http://stackoverflow.com/questions/22114610/downloading-a-file-with-wget-using-multiple-connections

    where http://example.org/ is a sample path to downloaded file from server
    —————————
    THREADS=10
    WGETFLAGS=’-r -np -N’
    URL=”http://example.org/”
    for i in {1..$THREADS}; do
    wget $WGETFLAGS “$URL”
    done
    —————————
    or
    —————————
    THREADS=10
    WGETFLAGS=’-r -np -N’
    URL=”http://example.org/”
    for i in `seq 1 $((THREADS-1))`
    do
    Command=”$Command””wget $WGETFLAGS $URL”‘ & ‘
    done
    Command=”$Command””wget $WGETFLAGS $URL”
    eval $Command
    —————————-

    but you need to complete wget script with:

    2. LOCATION of downloading file e.g. on hdd like on website: http://stackoverflow.com/questions/1078524/how-to-specify-the-location-with-wget

    -O is the option to specify the path of the file you want to download to.

    wget -O /path/to/folder/file.ext

    -P is prefix where it will download the file in the directory

    wget -P /path/to/folder

    This options will protect e.g. live cd users when wget by default download to RAM , so user can change into hdd or usb and download larger files.

    3. REASUMING OPTION like on: http://askubuntu.com/questions/610761/how-do-i-restart-a-wget-download e.g.:
    wget -c http://to-downloading-file-from-internet
    but this will only work if the server supports Range headers.

    4. Masking wget before blocking direct download, which some servers do, like described on website: http://askubuntu.com/questions/24935/wget-downloads-corrupt-jpeg-file

    using wget with the -U flag, adding the user agent string of your browser, e.g.:

    wget -U “Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.17 (KHTML, like Gecko) Ubuntu/11.04 Chromium/11.0.654.0 Chrome/11.0.654.0 Safari/534.17” http://to-downloading-file-from-internet

    But finally even if you will tired and complete your script, if owner of the server doesn’t want free open linux (no tracking) software, the only way to download from this server will be forced by this owner of the server software.

Leave a Comment