AWS CLI download via S3

<div class=”toc-macro

Overview

AWS CLI is a command line application that you can use to download files from the Data Access Portal (DAP).  It is available for Windows, MacOS and Linux.
AWS CLI (external site)

Installing AWS CLI

Refer to the AWS CLI documentation for installation (external site).

Many Unix-like distributions will have a package that you can install.  Refer to your operating system’s documentation for guidance on how to install via your package manager.

Configuring AWS CLI

If you don’t use AWS CLI for anything else you can skip this section.

If you already use AWS CLI and have configured it for connecting to other systems, you may need to ensure your default settings do not cause conflicts.

For example, if you regularly use another third party S3 service (not Amazon Web Services) that uses self-signed certificates then you may have specified your local certificate bundle in your default configuration.  Your configuration files are stored at the following location:

Operating System Path
Linux/Unix/MacOS ~/.aws/
Windows %USERPROFILE%/.aws/

If you don’t have a .aws folder, it’s not necessary to create one when downloading from the DAP.

If you do have a file called config in your .aws folder, look at the contents to see if you have any default settings, e.g.

Unix/MacOS – Edit AWS Config File
# Or use your text editor of choice
vi ~/.aws/config
Screenshot of text editor with following lines:
[default]
ca_bundle = ~/ .aws/example-org-chain.pem
Windows – Edit AWS Config File
notepad %USERPROFILE%/.aws/config
Screenshot of text editor with following lines:
[default]
ca_bundle = ~/ .aws/example-org-chain.pem

 

Anything you have in your [default] profile could cause problems when downloading from the DAP.  You should consider configuring specific profiles for the different connections you wish to make.  Refer to the AWS CLI documentation (external site) for guidance on how to use configuration files.

Downloading using AWS CLI without saving a configuration

Every time you request access to a DAP collection via S3 you will receive a different set of credentials that only apply to that collection.  These credentials only remain valid for around 48 hours. Depending on your needs you may not want to go through the process of saving a configuration.

Downloading files without saving a configuration involves three steps:

  • Retrieving the connection information.
  • Entering access keys as environment variables.
  • Running a download command.

First you need to request the S3 access details for the DAP collection you want (follow the steps listed up to but not including where you click “Open S3 Client”).

Note: The Download Information is unique to a download request for a collection. It is provided here to illustrate the download process and copying this information exactly will cause an error.

 

You can click the “copy” icons – – to copy the values you need to your clipboard.

Save the Access Key and Secret Access Key as environment variables:

Unix/MacOS – save environment variables
# These are example access keys that will not work
export AWS_ACCESS_KEY_ID=ABCDEFGHIJK123456789
export AWS_SECRET_ACCESS_KEY=Aa1234+567/BbCcDcEeFfGgHhIiJjKkLlMmNnOoP

Windows Powershell sets environment variables in a different way to Windows Command Line

Windows Powershell – save environment variables
# These are example access keys that will not work
$env:AWS_ACCESS_KEY_ID = "ABCDEFGHIJK123456789"
$env:AWS_SECRET_ACCESS_KEY = "Aa1234+567/BbCcDcEeFfGgHhIiJjKkLlMmNnOoP"
Windows Command Line – save environment variables
# These are example access keys that will not work
SET AWS_ACCESS_KEY_ID=ABCDEFGHIJK123456789
SET AWS_SECRET_ACCESS_KEY=Aa1234+567/BbCcDcEeFfGgHhIiJjKkLlMmNnOoP

An optional step is to create a folder that you want to download the files to.  If you skip this step and specify a folder that you have not already created, AWS CLI will create the folder for you.  In the below example modify the path to an appropriate place on your file system.

Unix/MacOS – create folder to download to
mkdir ~/Downloads/my_dap_download_example

Note that if your local folder has spaces in it you will need to enclose the local folder path in quotes.

Windows – create folder to download to
md %USERPROFILE%\Downloads\my_dap_download_example

Use the Endpoint URL and the Remote Directory values to run a download command, e.g.

Unix/MacOS – download to folder
# aws s3 --endpoint-url {endpoint url value} --recursive cp s3://{remote directory value} {local path} 
aws s3 --endpoint-url https://s3.data.csiro.au --recursive cp s3://dapprd/000005588v002/ ~/Downloads/my_dap_download_example/

Note that if your local folder has spaces in it you will need to enclose the local folder path in quotes.

Windows – download to folder
aws s3 --endpoint-url https://s3.data.csiro.au --recursive cp s3://dapprd/000005588v002/ %USERPROFILE%/Downloads/my_dap_download_example/

Improving download speeds

AWS CLI does not have many options that will help you optimise the speed of your downloads.  If you are on a fast network connection and feel you should be able to achieve faster speeds, rclone is a command line tool you can use that allows you to download multiple files concurrently, which can improve speeds.