Rclone download via S3
Overview | Installing rclone | Download using the pre-built command | Downloading using rclone without saving a configuration | Configuring rclone | Downloading using a configured rclone connection | Downloading a sub-set of files | Improving download speeds
Overview
Rclone is a command line application that supports a number of protocols. It is available for a number of operating systems, including Windows, MacOS and various Unix-like OS distributions. It supports parallel transfers (i.e. more than one file at a time) as well as concurrent multi-part transfers (i.e. splitting a large file into multiple parts and transferring them concurrently). Depending on the number and size of the files you are transferring, this can improve transfer speeds. We recommend using version 1.60.1 or above.
Installing rclone
Refer to the rclone documentation for installing the software (external site). Older versions may not work properly for downloading data from the Data Access Portal, we recommend using version 1.60.1 or above.
Many Unix-like distributions will have an older version of rclone that may not work well downloading from the DAP. We recommend using version 1.60.1 or above. Refer to your operating system documentation for further information on supported packages. Alternatively, see the rclone documentation for upgrading to a newer version.
Note for Windows users
The rclone download for Windows is a portable file, meaning you can put the program anywhere on your system and run it. You may wish to add the location to your “path” environment variable to make it easier to run.
Download using the pre-built command
This is the simplest option if you want to use rclone with a single command to download all files in a DAP collection.
After selecting the option to “Download files via S3 Client” and agreeing to the licence, you will be presented with three tabs:
- S3 Client
- rClone
- AWS CLI
Click the “rClone” tab (or press the <TAB>
key until the “rClone” tab is selected and press <ENTER>
).
You will be presented with a pre-built command for downloading all files from the collection to your present working directory.
Note that the above screen shot contains example access keys that do not work.
Older versions of rclone do not support some of the options that have been added to this pre-built command, e.g.
- –multi-thread-cutoff 250M
- –multi-thread-streams 4
We recommend using rclone version 1.60.1 or above. See the rclone documentation for details on updating to a newer version.
If you’re unable to update your version of rclone, try deleting these options from the command and then running it.
In your terminal, navigate to the directory (or folder) that you want to download the files to. You may want to make a folder for the download. The examples below assume the existence of a folder called “C:\mydownloadsfolder” on Windows and “~/my/downloads/folder” on Mac OS and Linux. The examples navigate to this folder, create a sub-folder called “topographic_wetness_index” and then navigate to the newly created folder>
cd c:\mydownloadsfolder md topographic_wetness_index cd topographic_wetness_index
cd ~/my/downloads/folder mkdir topographic_wetness_index cd topographic_wetness_index
On the DAP collection web page, click the “clipboard” iconto copy the text of the command to your keyboard (or press the <TAB>
key until the icon is selected and press <ENTER>
to copy it).
Paste the command into your terminal. If you don’t know how to paste into your terminal you may have to look this up for your operating system. Many operating systems let you right-click your mouse to do this. Some others let you use <CTRL> + <V>
or maybe <CTRL> + <SHIFT> + <V>
. Since there are many possible configurations, it is outside the scope of this guide.
Before running the command, review the options that have been used. The command includes some rclone options using their default values, but you may wish to change them depending on the size and number of the files you want to download. e.g. you might want to increase the --transfers
and --multi-thread-streams
values to something like 10 each. See the section on Improving Download Speeds for more information.
Once you have set the options you like, press <ENTER>
to run the command.
Downloading using rclone without saving a configuration
This section covers creating a single rclone command to download a DAP collection. The above section, Download using the pre-built command, gives you the same command that you can simply copy and paste. This section is useful if you need to type in a command manually and are not able to paste the pre-built command into your terminal.
Every time you request access to a DAP collection via S3 you will receive a different set of credentials that only apply to that collection. These credentials only remain valid for around 48 hours. Depending on your needs you may not want to go through the process of running “rclone config” (see the Configuring rclone section if you would prefer to configure a connection).
You can run a single command like the below example to download all files from a DAP collection.
First you need to request the S3 access details for the DAP collection you want (follow the steps listed up to but not including where you click “Open S3 Client”).
Note: The Download Information is unique to a download request for a collection. It is provided here to illustrate the download process and copying this information exactly will cause an error.
You can click the “copy” icons – – to copy the values you need to your clipboard.
Use these values to create a command like the template examples below, replacing the values in curly braces (e.g. “{Server}”) with the values for the collection you want to download.
The following arguments used in the examples are optional. The default values are listed in the template and modified values are used in the example that follows the template.
Argument | Notes |
---|---|
-P |
Or –progress. Displays progress on the downloads. |
--transfers |
The number of files to download simultaneously. Default is 4. Increasing this to something like 8 or 10 can improve speeds. |
--multi-thread-cutoff |
Default 250M (i.e. 250MiB). Files over this size will be downloaded in multiple parts, which can improve speeds. Consider lowering this from the default, although reducing it below 10M can be detrimental to speeds. |
--multi-thread-streams |
Default 4. The number of simultaneous parts for a single file download when a file’s size exceeds –multi-thread-cutoff. Consider increasing this number when downloading larger files. |
# Template rclone -P \ --s3-provider Other \ --s3-endpoint {Server} \ --s3-access-key-id {Access Key} \ --s3-secret-access-key {Secret Access Key} \ --transfers 4 \ --multi-thread-cutoff 250M \ --multi-thread-streams 4 \ copy :s3:/{Remote Directory} {local download folder} # e.g. with example access keys that will not work rclone -P \ --s3-provider Other \ --s3-endpoint s3-cbr.csiro.au \ --s3-access-key-id ABCDEFGHIJK123456789 \ --s3-secret-access-key Aa1234+567/BbCcDcEeFfGgHhIiJjKkLlMmNnOoP \ --transfers 10 \ --multi-thread-cutoff 10M \ --multi-thread-streams 10 \ copy :s3:/dapprd/000005588v002/ ~/Downloads/my_download_folder
# Template rclone -P ` --s3-provider Other ` --s3-endpoint {Server} ` --s3-access-key-id {Access Key} ` --s3-secret-access-key {Secret Access Key} ` --transfers 4 ` --multi-thread-cutoff 250M ` --multi-thread-streams 4 ` copy :s3:/{Remote Directory} {local download folder} # e.g. with example access keys that will not work rclone -P ` --s3-provider Other ` --s3-endpoint s3-cbr.csiro.au ` --s3-access-key-id ABCDEFGHIJK123456789 ` --s3-secret-access-key Aa1234+567/BbCcDcEeFfGgHhIiJjKkLlMmNnOoP ` --transfers 10 ` --multi-thread-cutoff 10M ` --multi-thread-streams 10 ` copy :s3:/dapprd/000005588v002/ E:\downloads\my_download_folder
REM Template rclone -P --s3-provider Other --s3-endpoint {Server} --s3-access-key-id {Access Key} --s3-secret-access-key {Secret Access Key} --transfers 4 --multi-thread-cutoff 250M --multi-thread-streams 4 copy :s3:/{Remote Directory} {local download folder} REM e.g. with example access keys that will not work rclone -P --s3-provider Other --s3-endpoint s3-cbr.csiro.au --s3-access-key-id ABCDEFGHIJK123456789 --s3-secret-access-key Aa1234+567/BbCcDcEeFfGgHhIiJjKkLlMmNnOoP --transfers 10 --multi-thread-cutoff 10M --multi-thread-streams 10 copy :s3:/dapprd/000005588v002/ E:\downloads\my_download_folder
Configuring rclone
Rclone allows you to store credentials in a number of ways, e.g. environment variables. You can use whatever works for you, the following is just one approach. Refer to https://rclone.org/s3/ for more options. In this example we will be configuring a download of:
Gallant, John; Austin, Jenet (2012): Topographic Wetness Index derived from 1″ SRTM DEM-H. v2. CSIRO. Data Collection. https://doi.org/10.4225/08/57590B59A4A08
First you need to request the S3 access details for the DAP collection you want (follow the steps listed up to but not including where you click “Open S3 Client”).
You can click the “copy” icons – – to copy the values you need to your clipboard.
Run the rclone config command to set up a new connection:
rclone config
The number of prompts you receive will vary depending on the version of rclone you are using. The below is based on rclone 1.62.2.
Select “n” for “New Remote”
Name: Give your connection a name, e.g. since this example is for a connection to “Topographic Wetness Index” we will use “twi”.
Storage Type: Choose “s3” for “Amazon S3 Compliant Storage Providers including AWS…”.
S3 Provider: Choose “Other”, note that for some systems this is case sensitive, so ensure you use the upper case first letter “O”.
Get AWS credentials from runtime: press <Enter> for the default “false”. If you’d prefer to use this method, refer to the rclone documentation.
AWS Access Key ID: Copy the “Access Key” value from the “Download Information” section of the DAP collection.
Note that this screen shot contains an example access key that will not work.
AWS Secret Access Key: Copy the “Secret Access Key” value from the “Download Information” section of the DAP collection.
Note that this screen shot contains an example secret access key that will not work.
Region: Press <Enter> to select the default which is blank.
Endpoint: Copy the “Server” value from the “Download Information” section of the DAP collection, i.e. “s3.data.csiro.au”.
Location Constraint: Press <Enter> to select the default which is blank.
Canned ACL: Press <Enter> to select the default which is blank.
Edit Advanced Config: Press <Enter> to select the default of “No”.
Confirm Configuration: Review what you have entered. If everything is correct, press <Enter> to select the default of “Yes”
Note that this screen shot contains example access keys that will not work.
Current Remotes: You will be presented with a list of your configured connections.
Enter “q” to quit when you are finished.
Proceed to the next section of this page to see how to download the files using the connection you have configured.
Downloading using a configured rclone connection
Refer to the rclone general documentation and the rclone S3 documentation for a full list of commands available.
In the above section we configured a connection called “twi” for “Topographic Wetness Index”. We will use this with the rclone commands provided below as examples. The following arguments used in the examples are optional. The default values are listed in the template and modified values in the example.
Argument | Notes |
---|---|
-P |
Or --progress . Displays progress on the downloads. |
--transfers |
The number of files to download simultaneously. Default is 4. Increasing this to something like 8 or 10 can improve speeds. |
--multi-thread-cutoff |
Default 250M (i.e. 250MiB). Files over this size will be downloaded in multiple parts, which can improve speeds. Consider lowering this from the default, although reducing it below 10M can be detrimental to speeds. |
--multi-thread-streams |
Default 4. The number of simultaneous parts for a single file download when a file’s size exceeds --multi-thread-cutoff . Consider increasing this number when downloading larger files. |
# Template rclone -P \ --transfers 4 \ --multi-thread-cutoff 250M \ --multi-thread-streams 4 \ copy {connection name}:/{Remote Directory} {local download folder} # e.g. rclone -P \ --transfers 10 \ --multi-thread-cutoff 10M \ --multi-thread-streams 10 \ copy twi:/dapprd/000005588v002/ ~/Downloads/my_download_folder
# Template rclone -P ` --transfers 4 ` --multi-thread-cutoff 250M ` --multi-thread-streams 4 ` copy {connection name}:/{Remote Directory} {local download folder} # e.g. rclone -P ` --transfers 10 ` --multi-thread-cutoff 10M ` --multi-thread-streams 10 ` copy twi:/dapprd/000005588v002/ E:\downloads\my_download_folder
REM Template rclone -P --transfers 4 --multi-thread-cutoff 250M --multi-thread-streams 4 copy {connection name}:/{Remote Directory} {local download folder} REM e.g. rclone -P --transfers 10 --multi-thread-cutoff 10M --multi-thread-streams 10 copy twi:/dapprd/000005588v002/ E:\downloads\my_download_folder
Downloading a sub-set of files
The examples on this page all provide commands to download all files of a collection. If you only want to download specific files you will need to use rclone’s filtering options. Since the rclone documentation covers all the features, this page will only provide some examples.
By default rclone will transfer all files and folders beneath the path you request. In order to only download specific files you should use the --include
, --exclude
, or --filter
options. If you need to combine --include
and --exclude
options the rclone documentation recommends you only use the --filter
option instead.
Imagine that you want to download some files from the Topographic Wetness Index derived from 1″ SRTM DEM-H collection.
Simple example
Let’s say that you are only interested in “TopographicWetnessIndex_3_arcsecond_resolution” folder.
This example assumes you have configured an rclone connection called “twi”:
# Using two patterns, which you do by putting the comma separated list inside braces, e.g. # # {metadata**,**TopographicWetnessIndex_3_arcsecond_resolution**} rclone -P copy twi:/dapprd/000005588v002/ . --include {metadata**,**TopographicWetnessIndex_3_arcsecond_resolution**}
**
matches any character including forward slash. The patterns are relative to the path defined in the source, in this example /dapprd/000005588v002/
.
Pattern | Explanation | Example of file that matches |
metadata** | Any file that has a path starting with /dapprd/000005588v002/metadata i.e. we’re looking in the folder /dapprd/000005588v002/ and anything below that begins with “metadata”. | /dapprd/000005588v002/metadata/dublincore-000005588v002.xml |
**TopographicWetnessIndex_3_arcsecond_resolution** | Any file where TopographicWetnessIndex_3_arcsecond_resolution appears anywhere in the path, including the file name. | /dapprd/000005588v002/data/TopographicWetnessIndex_3_arcsecond_resolution/dem_h_twi_3s_metadata.doc |
Advanced example
This section assumes you are familiar with regular expressions.
This particular collection contains spatial data in both mosaic and tile format. Let’s say you only wanted to download tile data covering Tasmania.
The folder structure for the tiled data uses the following pattern:
TopographicWetnessIndex_1_arcsecond_resolution
tiles
- longitude folder in 1 degree increments, e.g. “
e146
“- latitude folder in 1 degree increments, e.g. “
s44
“- folder with longitude and latitude, e.g. “
e146s44
“ info
- folder with longitude and latitude, e.g. “
- latitude folder in 1 degree increments, e.g. “
- longitude folder in 1 degree increments, e.g. “
For the main island of Tasmania we might want the following folders with their contents:
e144
s40
s41
s42
e145
s41
s42
s43
s44
e146
s40
s41
s42
s43
s44
e147
s40
s41
s42
s43
s44
e148
s40
s41
s42
s43
s44
You can use regular expression patterns if you enclose them in two sets of braces, e.g. --include {{pattern}}
A pattern that matches the above folder structure is:
{{.*/e14[4-8]/s4[0-4]/.*}}
To break the pattern down:
Section of pattern | Explanation |
---|---|
{{ | Instruction to rclone that what follows is a regular expression. |
.* | The path of the file can start with any number of any characters (including no characters). |
/e14[4-8]/ | The path must contain a section with a slash, followed by “e14”, then a fourth character that is between 4 and 8 inclusive, and a slash immediately after this, i.e. one of:
|
s4[0-4]/ | The previous must be followed by “s4”, then a third character that is between 0 and 4 inclusive, then another slash, i.e. one of:
|
.* | The path can contain any characters after the previous. |
}} | Instruction to rclone that this is the end of the regular expression. |
Note that this pattern would match sections in both the 1 arsecond folders and the 3 arscecond folders. To only download one of these we need to extend the regular expression, e.g.
{{.*/TopographicWetnessIndex_1_arcsecond_resolution/.*/e14[4-8]/s4[0-4]/.*}}
So to download just those folders you could use a command like the following:
rclone -P copy twi:/dapprd/000005588v002/ . --include {{.*/TopographicWetnessIndex_1_arcsecond_resolution/.*/e14[4-8]/s4[0-4]/.*}}
Improving download speeds
If your download speed using rclone is not as fast as you would expect, here are some options you can consider:
Rclone argument | Notes |
---|---|
--transfers |
The default is 4 . When you are downloading a collection with many files, increasing this to something like 8 or 10 should improve speeds.
Increasing this to a large number will often reduce your download speeds. If you are downloading thousands of smaller files, e.g. all smaller than 10MiB, it is occasionally useful to increase this number above |
--multi-thread-cutoff |
The default is 250M (i.e. 250MiB). Files over this size will be downloaded in multiple parts simultaneously. Lowering this figure can improve speeds in some circumstances.
Setting this value below The rclone documentation notes for Windows users that multi-thread downloads cause the resulting file to be “sparse”, which can have disadvantages. Refer to the rclone documentation for further information if this is causing problems for you. |
--multi-thread-streams |
The default is 4 . When you are downloading files larger than the --multi-thread-cutoff value, increasing this number to something like 8 or 10 can improve speeds.
The rclone documentation (external site) states:
For example, if you are downloading many files that are around 1GiB, you might consider using something like If you are downloading very large files it can sometimes be useful to decrease |