Showing File Download Progress Using Wget

by Louis Marascio on February 11, 2011

This post is part of a series: Bash Tips and Tricks.

In this inaugural post for the Bash Tips and Tricks series I’ve decided to lead off with a tip that I hacked up as a result of a personal pet peeve of mine: superfluous output to stdout. When I write scripts that must perform some long running task like downloading a large file I want the user to know something useful is happening and about how long it will take. The easiest option is to just spam stdout with as much output as possible, but I really don’t like doing this. Too much output might tell the user something is happening, but it can also hide more useful information like errors and can be deceptive as to how much progress the long running task has made. I prefer to show nice, clean output whenever possible, and I let the various programs spew to log files in case an error occurs.

This tip demonstrates how you can download a file using wget and show a nice, simple progress meter. It gives the user of your script exactly what they want to know: how much has been downloaded and approximately how much longer do I have to wait. Directly below you’ll find the solution. For those that are curious how it comes together keep reading and I’ll dissect the various bits of the solution to help you understand how it works.

download()
{
    local url=$1
    echo -n "    "
    wget --progress=dot $url 2>&1 | grep --line-buffered "%" | \
        sed -u -e "s,\.,,g" | awk '{printf("\b\b\b\b%4s", $2)}'
    echo -ne "\b\b\b\b"
    echo " DONE"
}

The meat of the tip is in the call to wget. It chains together wget, grep, sed, and awk to get the percentage complete from wget. Using it is even easier:

file="patch-2.6.37.gz"
echo -n "Downloading $file:"
download "http://www.kernel.org/pub/linux/kernel/v2.6/$file"

The above usage would generate output that is nice and simple, like this:

Downloading patch-2.6.37.gz:  6%

That is a lot better than the default wget output, which is 9 lines of crap and 1 line of useful information:

--2011-02-10 08:08:31-- http://www.kernel.org/pub/linux/kernel/v2.6/patch-2.6.37.gz
Resolving www.kernel.org... 204.152.191.37, 149.20.20.133
Connecting to www.kernel.org|204.152.191.37|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 15430256 (15M) [application/x-gzip]
Saving to: "patch-2.6.37.gz"

100%[==========================================>] 15,430,256 1.79M/s in 10s

2011-02-10 08:08:42 (1.48 MB/s) - "patch-2.6.37.gz" saved [15430256/15430256]

You can see this tip and the code behind it in the GitHub repository for the Bash Tips and Tricks series.

Here’s how it all works.

First, let’s understand the call to wget:

wget --progress=dot $url 2>&1

The default wget output is about 10 lines of which only 1 is really useful, the progress bar line. Does the user really care to know that the HTTP request was sent or that wget is waiting on a response? No, not really. Thankfully, wget has a command line option that can help us out.

To get a simplified status display from wget I use the --progress=dot option. This will give us status output as follows:

--2011-02-10 08:08:31-- http://www.kernel.org/pub/linux/kernel/v2.6/patch-2.6.37.gz
Resolving www.kernel.org... 204.152.191.37, 149.20.20.133
Connecting to www.kernel.org|204.152.191.37|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 15430256 (15M) [application/x-gzip]
Saving to: "patch-2.6.37.gz"

0K .......... .......... .......... .......... .......... 0% 189K 80s 50K .......... .......... .......... .......... .......... 0% 655K 51s 150K .......... .......... .......... .......... .......... 1% 686K 40s 200K .......... .......... .......... .......... .......... 1% 743K 36s 300K .......... .......... .......... .......... .......... 2% 767K 29s 350K .......... .......... .......... .......... .......... 2% 770K 28s [... etc ...]

Now that wget is giving us some more reasonable to work with we can use grep to get the information that is relevant to us. For this tip the only thing I’m interested in are lines with percent signs (%) in them. Here is our call to grep:

grep --line-buffered "%"

The ‘–line-buffered- flag is very important. Without it we won’t see much if any progress changes because grep will be buffering until EOF. Telling grep to buffer until end of line means it will read one line, match the pattern, and write any matched output on a line by line basis. We now have something like this:

   0K .......... .......... .......... .......... ..........  0%  189K 80s
  50K .......... .......... .......... .......... ..........  0%  655K 51s
 150K .......... .......... .......... .......... ..........  1%  686K 40s
 200K .......... .......... .......... .......... ..........  1%  743K 36s
 300K .......... .......... .......... .......... ..........  2%  767K 29s
 350K .......... .......... .......... .......... ..........  2%  770K 28s
 [... etc ...]

The next step is to get rid anything that might be superfluous or break our word oriented parsing we’re going to do with awk. Namely, ew need to get rid of all those dots. The big reason here is that the number of dots that appears is not consistent. It might be that near the end of the file we get a line with only 3 dots. This isn’t good since we’re going to use awk in the last step to extract the information we want. So, let’s nuke those dots.

sed -u -e "s,\.,,g"

This is pretty straightforward. We have a regular expression that matches a dot (.) and we replace it with nothing, nuking all dots that might appear on the input stream. The only real trick here is to use the ‘-u’ flag, which is short for ‘–unbuffered’. As you can imagine, the reasoning for this is the same as in the grep explanation above, we want output to show up on stdin as soon as possible, not at EOF. After applying the sed transoformation our input to the last program in the pipeline will look like this:

   0K 0%  189K 80s
  50K 0%  655K 51s
 150K 1%  686K 40s
 200K 1%  743K 36s
 300K 2%  767K 29s
 350K 2%  770K 28s

We are going to use awk to extract the information we want. Our invocation of awk is straightforward:

awk '{printf("\b\b\b\b%4s", $2)}'

The fields are separated by space and we take the 2nd field with is the percent of the download that we have completed. The backspace characters (\b) are written first to back us up on the terminal then we write out a four character percentage completion string. awk is the last tool in our pipeline so it will now write to stdout, and this is what we’ll see:

  2%

A right aligned, space padded, four character string with the percentage of the download we’ve completed.

  • Ham63

    Very useful. Thank you.

  • Fabrice

    Thanks! I just added an option to define the location and name of the file downloaded. Here is the new function:

    function download() {     local url=$1     local destin=$2     echo -n ”    ”     if [ $destin ]; then     wget –progress=dot $url  -O  $destin 2>&1 | grep –line-buffered “%” |         sed -u -e “s,.,,g” | awk ‘{printf(“bbbb%4s”, $2)}’     else    wget –progress=dot $url 2>&1 | grep –line-buffered “%” | sed -u -e “s,.,,g” | awk ‘{printf(“bbbb%4s”, $2)}’     fi     echo -ne “bbbb”     echo ” DONE” }

    You run is as: download $url_here $destination_here

    This will download and place the file into the script directory with a new name: download “http://mysite.com/myfile.jpg” “${0%/*}/mydownload.jpg”

  • anqxyr

    For some reason, this didn’t work for me. So I came up with a simplified one-liner version:

    wget –progress=dot $url 2>&1|grep –line-buffered -o “[0-9]*%”|xargs -L1 echo -en “bbbb”;echo

  • Iskren

    Works like a charm! However, sometimes downloads get interrupted, so one could use the ‘-c’ flag to wget to continue if supported by the server. When resuming download, wget prints commas instead of dots, so we can modify it a bit, like this:

    wget -c –progress=dot $url 2>&1 | grep –line-buffered “%” |

    
    
        sed -u -e "s|.||g" -e "s|,||g" | awk '{printf("bbbb%4s", $2)}'

  • reza

    Hi, is there any way we could do it on logfiles? for some reason I have to run wget on the background and then check the logfile to see how much it’s got

  • nep

    Hi !

    I’ve got 2 issues with this set of commands : the update rate is really low. How can I smooth the % progress ? i would like to show the time remaining to users. What’s the right setting with printf ?

Previous post:

Next post: