Rclone download via S3

Overview

Rclone is a command line application that supports a number of protocols.  It is available for a number of operating systems, including Windows, MacOS and various Unix-like OS distributions.  It supports parallel transfers (i.e. more than one file at a time) as well as concurrent multi-part transfers (i.e. splitting a large file into multiple parts and transferring them concurrently).  Depending on the number and size of the files you are transferring, this can improve transfer speeds. We recommend using version 1.60.1 or above.

Installing rclone

Refer to the rclone documentation for installing the software (external site).  Older versions may not work properly for downloading data from the Data Access Portal, we recommend using version 1.60.1 or above.

Many Unix-like distributions will have an older version of rclone that may not work well downloading from the DAP.  We recommend using version 1.60.1 or above.  Refer to your operating system documentation for further information on supported packages.  Alternatively, see the rclone documentation for upgrading to a newer version.

Note for Windows users

The rclone download for Windows is a portable file, meaning you can put the program anywhere on your system and run it.  You may wish to add the location to your “path” environment variable to make it easier to run.

For example, let’s say you downloaded rclone and extracted the download to C:appsrclone-v1.63.1-windows-amd64:

 

If you tried to run rclone from a different folder, you would get an error message: “‘rclone’ is not recognized as an internal or external command, operable program or batch file”.

 

Fix this by completing the following steps:

  1. Open the “Start” menu.
  2. Type in “Environment Variables” and select the option “Edit environment variables for your account”.
  3. In the “Environment Variables” window, select the “Path” row and click “Edit…”
  4. In the “Edit environment variable” window, click the “New” button and enter the path where you saved “rclone.exe”, e.g.
  5. In the “Edit environment variable” window, click “OK”.
  6. In the “Environment Variables” window, click “OK”.
  7. Any command prompts you already have open will not be affected by the above steps.  You will need to open a new command prompt for rclone to be recognised.

Download using the pre-built command


This is the simplest option if you want to use rclone with a single command to download all files in a DAP collection.

After selecting the option to “Download files via S3 Client” and agreeing to the licence, you will be presented with three tabs:

  • S3 Client
  • rClone
  • AWS CLI

 

Click the “rClone” tab (or press the <TAB> key until the “rClone” tab is selected and press <ENTER>).

You will be presented with a pre-built command for downloading all files from the collection to your present working directory.

 

Note that the above screen shot contains example access keys that do not work.

Older versions of rclone do not support some of the options that have been added to this pre-built command, e.g.

  • –multi-thread-cutoff 250M
  • –multi-thread-streams 4

We recommend using rclone version 1.60.1 or above.  See the rclone documentation for details on updating to a newer version.

If you’re unable to update your version of rclone, try deleting these options from the command and then running it.

In your terminal, navigate to the directory (or folder) that you want to download the files to.  You may want to make a folder for the download.  The examples below assume the existence of a folder called “C:\mydownloadsfolder” on Windows and “~/my/downloads/folder” on Mac OS and Linux.  The examples navigate to this folder, create a sub-folder called “topographic_wetness_index” and then navigate to the newly created folder>

Windows example
cd c:\mydownloadsfolder
md topographic_wetness_index
cd topographic_wetness_index
Mac OS and Linux example
cd ~/my/downloads/folder
mkdir topographic_wetness_index
cd topographic_wetness_index

On the DAP collection web page, click the “clipboard” iconto copy the text of the command to your keyboard (or press the <TAB> key until the icon is selected and press <ENTER> to copy it).

Paste the command into your terminal.  If you don’t know how to paste into your terminal you may have to look this up for your operating system.  Many operating systems let you right-click your mouse to do this.  Some others let you use <CTRL> + <V> or maybe <CTRL> + <SHIFT> + <V>.  Since there are many possible configurations, it is outside the scope of this guide.

Before running the command, review the options that have been used.  The command includes some rclone options using their default values, but you may wish to change them depending on the size and number of the files you want to download.  e.g. you might want to increase the --transfers and --multi-thread-streams values to something like 10 each.  See the section on Improving Download Speeds for more information.

Once you have set the options you like, press <ENTER> to run the command.

Downloading using rclone without saving a configuration

This section covers creating a single rclone command to download a DAP collection.  The above section, Download using the pre-built command, gives you the same command that you can simply copy and paste.  This section is useful if you need to type in a command manually and are not able to paste the pre-built command into your terminal.

Every time you request access to a DAP collection via S3 you will receive a different set of credentials that only apply to that collection.  These credentials only remain valid for around 48 hours. Depending on your needs you may not want to go through the process of running “rclone config” (see the Configuring rclone section if you would prefer to configure a connection).

You can run a single command like the below example to download all files from a DAP collection.

First you need to request the S3 access details for the DAP collection you want (follow the steps listed up to but not including where you click “Open S3 Client”).

Note: The Download Information is unique to a download request for a collection. It is provided here to illustrate the download process and copying this information exactly will cause an error. 

 

You can click the “copy” icons – – to copy the values you need to your clipboard.

Use these values to create a command like the template examples below, replacing the values in curly braces (e.g. “{Server}”) with the values for the collection you want to download.

The following arguments used in the examples are optional.  The default values are listed in the template and modified values are used in the example that follows the template.

Argument Notes
-P Or –progress.  Displays progress on the downloads.
--transfers The number of files to download simultaneously.  Default is 4.  Increasing this to something like 8 or 10 can improve speeds.
--multi-thread-cutoff Default 250M (i.e. 250MiB).  Files over this size will be downloaded in multiple parts, which can improve speeds.  Consider lowering this from the default, although reducing it below 10M can be detrimental to speeds.
--multi-thread-streams Default 4.  The number of simultaneous parts for a single file download when a file’s size exceeds –multi-thread-cutoff.  Consider increasing this number when downloading larger files.
Unix/MacOS – download all files using rclone
# Template
rclone -P \
  --s3-provider Other \
  --s3-endpoint {Server} \
  --s3-access-key-id {Access Key} \
  --s3-secret-access-key {Secret Access Key} \
  --transfers 4 \
  --multi-thread-cutoff 250M \
  --multi-thread-streams 4 \
  copy :s3:/{Remote Directory} {local download folder}

# e.g. with example access keys that will not work
rclone -P \
  --s3-provider Other \
  --s3-endpoint s3-cbr.csiro.au \
  --s3-access-key-id ABCDEFGHIJK123456789 \
  --s3-secret-access-key Aa1234+567/BbCcDcEeFfGgHhIiJjKkLlMmNnOoP \
  --transfers 10 \
  --multi-thread-cutoff 10M \
  --multi-thread-streams 10 \
  copy :s3:/dapprd/000005588v002/ ~/Downloads/my_download_folder

Windows Powershell – download all files using rclone
# Template
rclone -P `
  --s3-provider Other `
  --s3-endpoint {Server} `
  --s3-access-key-id {Access Key} `
  --s3-secret-access-key {Secret Access Key} `
  --transfers 4 `
  --multi-thread-cutoff 250M `
  --multi-thread-streams 4 `
  copy :s3:/{Remote Directory} {local download folder}

# e.g. with example access keys that will not work
rclone -P `
  --s3-provider Other `
  --s3-endpoint s3-cbr.csiro.au `
  --s3-access-key-id ABCDEFGHIJK123456789 `
  --s3-secret-access-key Aa1234+567/BbCcDcEeFfGgHhIiJjKkLlMmNnOoP `
  --transfers 10 `
  --multi-thread-cutoff 10M `
  --multi-thread-streams 10 `
  copy :s3:/dapprd/000005588v002/ E:\downloads\my_download_folder
Windows Command Line – download all files using rclone
REM Template
rclone -P --s3-provider Other --s3-endpoint {Server} --s3-access-key-id {Access Key} --s3-secret-access-key {Secret Access Key} --transfers 4 --multi-thread-cutoff 250M --multi-thread-streams 4 copy :s3:/{Remote Directory} {local download folder}

REM e.g. with example access keys that will not work
rclone -P --s3-provider Other --s3-endpoint s3-cbr.csiro.au --s3-access-key-id ABCDEFGHIJK123456789 --s3-secret-access-key Aa1234+567/BbCcDcEeFfGgHhIiJjKkLlMmNnOoP --transfers 10 --multi-thread-cutoff 10M --multi-thread-streams 10 copy :s3:/dapprd/000005588v002/ E:\downloads\my_download_folder

Configuring rclone

Rclone allows you to store credentials in a number of ways, e.g. environment variables.  You can use whatever works for you, the following is just one approach.  Refer to https://rclone.org/s3/ for more options.  In this example we will be configuring a download of:

Gallant, John; Austin, Jenet (2012): Topographic Wetness Index derived from 1″ SRTM DEM-H. v2. CSIRO. Data Collection. https://doi.org/10.4225/08/57590B59A4A08

First you need to request the S3 access details for the DAP collection you want (follow the steps listed up to but not including where you click “Open S3 Client”).

 

You can click the “copy” icons – – to copy the values you need to your clipboard.

Run the rclone config command to set up a new connection:

rclone config

The number of prompts you receive will vary depending on the version of rclone you are using.  The below is based on rclone 1.62.2.

Select “n” for “New Remote”

Name: Give your connection a name, e.g. since this example is for a connection to “Topographic Wetness Index” we will use “twi”.

Storage Type: Choose “s3” for “Amazon S3 Compliant Storage Providers including AWS…”.

S3 Provider: Choose “Other”, note that for some systems this is case sensitive, so ensure you use the upper case first letter “O”.

Get AWS credentials from runtime: press <Enter> for the default “false”.  If you’d prefer to use this method, refer to the rclone documentation.

AWS Access Key ID: Copy the “Access Key” value from the “Download Information” section of the DAP collection.

Note that this screen shot contains an example access key that will not work.

AWS Secret Access Key: Copy the “Secret Access Key” value from the “Download Information” section of the DAP collection.

Note that this screen shot contains an example secret access key that will not work.

Region: Press <Enter> to select the default which is blank.

Endpoint: Copy the “Server” value from the “Download Information” section of the DAP collection, i.e. “s3.data.csiro.au”.

Location Constraint: Press <Enter> to select the default which is blank.

Canned ACL: Press <Enter> to select the default which is blank.

Edit Advanced Config: Press <Enter> to select the default of “No”.

Confirm Configuration: Review what you have entered.  If everything is correct, press <Enter> to select the default of “Yes”

Note that this screen shot contains example access keys that will not work.

Current Remotes: You will be presented with a list of your configured connections.

Enter “q” to quit when you are finished.

Proceed to the next section of this page to see how to download the files using the connection you have configured.

Downloading using a configured rclone connection

Refer to the rclone general documentation and the rclone S3 documentation for a full list of commands available.

In the above section we configured a connection called “twi” for “Topographic Wetness Index”.  We will use this with the rclone commands provided below as examples.  The following arguments used in the examples are optional.  The default values are listed in the template and modified values in the example.

Argument Notes
-P Or --progress.  Displays progress on the downloads.
--transfers The number of files to download simultaneously.  Default is 4.  Increasing this to something like 8 or 10 can improve speeds.
--multi-thread-cutoff Default 250M (i.e. 250MiB).  Files over this size will be downloaded in multiple parts, which can improve speeds.  Consider lowering this from the default, although reducing it below 10M can be detrimental to speeds.
--multi-thread-streams Default 4.  The number of simultaneous parts for a single file download when a file’s size exceeds --multi-thread-cutoff.  Consider increasing this number when downloading larger files.
Unix/MacOS – download all files using rclone
# Template
rclone -P \
  --transfers 4 \
  --multi-thread-cutoff 250M \
  --multi-thread-streams 4 \
  copy {connection name}:/{Remote Directory} {local download folder}

# e.g.
rclone -P \
  --transfers 10 \
  --multi-thread-cutoff 10M \
  --multi-thread-streams 10 \
  copy twi:/dapprd/000005588v002/ ~/Downloads/my_download_folder

Windows Powershell – download all files using rclone
# Template
rclone -P `
  --transfers 4 `
  --multi-thread-cutoff 250M `
  --multi-thread-streams 4 `
  copy {connection name}:/{Remote Directory} {local download folder}

# e.g.
rclone -P `
  --transfers 10 `
  --multi-thread-cutoff 10M `
  --multi-thread-streams 10 `
  copy twi:/dapprd/000005588v002/ E:\downloads\my_download_folder
Windows Command Line – download all files using rclone
REM Template
rclone -P --transfers 4 --multi-thread-cutoff 250M --multi-thread-streams 4 copy {connection name}:/{Remote Directory} {local download folder}

REM e.g.
rclone -P --transfers 10 --multi-thread-cutoff 10M --multi-thread-streams 10 copy twi:/dapprd/000005588v002/ E:\downloads\my_download_folder

Downloading a sub-set of files

The examples on this page all provide commands to download all files of a collection.  If you only want to download specific files you will need to use rclone’s filtering options.  Since the rclone documentation covers all the features, this page will only provide some examples.

By default rclone will transfer all files and folders beneath the path you request.  In order to only download specific files you should use the --include, --exclude, or --filter options.  If you need to combine --include and --exclude options the rclone documentation recommends you only use the --filter option instead.

Imagine that you want to download some files from the Topographic Wetness Index derived from 1″ SRTM DEM-H collection.

Simple example

Let’s say that you are only interested in “TopographicWetnessIndex_3_arcsecond_resolution” folder.

This example assumes you have configured an rclone connection called “twi”:

# Using two patterns, which you do by putting the comma separated list inside braces, e.g.
#
#    {metadata**,**TopographicWetnessIndex_3_arcsecond_resolution**}

rclone -P copy twi:/dapprd/000005588v002/ . --include {metadata**,**TopographicWetnessIndex_3_arcsecond_resolution**}

** matches any character including forward slash.  The patterns are relative to the path defined in the source, in this example /dapprd/000005588v002/.

PatternExplanationExample of file that matches
metadata**Any file that has a path starting with /dapprd/000005588v002/metadata i.e. we’re looking in the folder /dapprd/000005588v002/ and anything below that begins with “metadata”./dapprd/000005588v002/metadata/dublincore-000005588v002.xml
**TopographicWetnessIndex_3_arcsecond_resolution**Any file where TopographicWetnessIndex_3_arcsecond_resolution appears anywhere in the path, including the file name./dapprd/000005588v002/data/TopographicWetnessIndex_3_arcsecond_resolution/dem_h_twi_3s_metadata.doc

Advanced example

This section assumes you are familiar with regular expressions.

This particular collection contains spatial data in both mosaic and tile format.  Let’s say you only wanted to download tile data covering Tasmania.

The folder structure for the tiled data uses the following pattern:

  • TopographicWetnessIndex_1_arcsecond_resolution
    • tiles
      • longitude folder in 1 degree increments, e.g. “e146
        • latitude folder in 1 degree increments, e.g. “s44
          • folder with longitude and latitude, e.g. “e146s44
          • info

For the main island of Tasmania we might want the following folders with their contents:

  • e144
    • s40
    • s41
    • s42
  • e145
    • s41
    • s42
    • s43
    • s44
  • e146
    • s40
    • s41
    • s42
    • s43
    • s44
  • e147
    • s40
    • s41
    • s42
    • s43
    • s44
  • e148
    • s40
    • s41
    • s42
    • s43
    • s44

You can use regular expression patterns if you enclose them in two sets of braces, e.g. --include {{pattern}}

A pattern that matches the above folder structure is:

{{.*/e14[4-8]/s4[0-4]/.*}}

To break the pattern down:

Section of pattern Explanation
{{ Instruction to rclone that what follows is a regular expression.
.* The path of the file can start with any number of any characters (including no characters).
/e14[4-8]/ The path must contain a section with a slash, followed by “e14”, then a fourth character that is between 4 and 8 inclusive, and a slash immediately after this, i.e. one of:
  • /e144/
  • /e145/
  • /e146/
  • /e147/
  • /e148/
s4[0-4]/ The previous must be followed by “s4”, then a third character that is between 0 and 4 inclusive, then another slash, i.e. one of:
  • s40/
  • s41/
  • s42/
  • s43/
  • s44/
.* The path can contain any characters after the previous.
}} Instruction to rclone that this is the end of the regular expression.

Note that this pattern would match sections in both the 1 arsecond folders and the 3 arscecond folders.  To only download one of these we need to extend the regular expression, e.g.

{{.*/TopographicWetnessIndex_1_arcsecond_resolution/.*/e14[4-8]/s4[0-4]/.*}}

So to download just those folders you could use a command like the following:

rclone -P copy twi:/dapprd/000005588v002/ . --include {{.*/TopographicWetnessIndex_1_arcsecond_resolution/.*/e14[4-8]/s4[0-4]/.*}}

Improving download speeds

If your download speed using rclone is not as fast as you would expect, here are some options you can consider:

Rclone argument Notes
--transfers The default is 4.  When you are downloading a collection with many files, increasing this to something like 8 or 10 should improve speeds.

Increasing this to a large number will often reduce your download speeds.

If you are downloading thousands of smaller files, e.g. all smaller than 10MiB, it is occasionally useful to increase this number above 10.  You may need to experiment to see what works best given your network conditions.

--multi-thread-cutoff The default is 250M (i.e. 250MiB).  Files over this size will be downloaded in multiple parts simultaneously.  Lowering this figure can improve speeds in some circumstances.

Setting this value below 10M is often detrimental to download speeds.

The rclone documentation notes for Windows users that multi-thread downloads cause the resulting file to be “sparse”, which can have disadvantages.  Refer to the rclone documentation for further information if this is causing problems for you.

--multi-thread-streams The default is 4.  When you are downloading files larger than the --multi-thread-cutoff value, increasing this number to something like 8 or 10 can improve speeds.

The rclone documentation (external site) states:

To calculate the number of download streams Rclone divides the size of the file by the --multi-thread-cutoff and rounds up, up to the maximum set with --multi-thread-streams.

So if --multi-thread-cutoff 250M and --multi-thread-streams 4 are in effect (the defaults):

  • 0..250 MiB files will be downloaded with 1 stream
  • 250..500 MiB files will be downloaded with 2 streams
  • 500..750 MiB files will be downloaded with 3 streams
  • 750+ MiB files will be downloaded with 4 streams

For example, if you are downloading many files that are around 1GiB, you might consider using something like --multi-thread-cutoff 100M and --multi-thread-streams 10 in order to increase the number of streams that each file is downloaded with.

If you are downloading very large files it can sometimes be useful to decrease --multi-thread-cutoff below the default and raise --multi-thread-streams above 10.  You may need to experiment to see what works best given your network conditions.