Using CM2 suites in Rose and Cylc

Rose is a GUI for managing and altering UM-based suites and experiments, with an SVN repository wrapper called Rosie. Cylc is a job scheduler, designed for cylcing workflows like climate models (but can be used for any scheduling/cycling applications). Rose and Cylc are used together to run recent versions of the UM.

In this tutorial, we will create a ACCESS-CM2 suite, explore the many setup options, learn how Cylc schedules the various tasks, and finally run a test of an altered historical experiment.

A recording is available in which this tutorial material is demonstrated (5th February, 2021): https://www.youtube.com/watch?v=tw05r9-o_SI

Part I: Copying ACCESS-CM2 suites with Rosie

A main product of our CMIP6 participation was the release of three standardised ACCESS-CM2 suites: ubr565 (pre-industrial control); u-bx616 (historical); and u-bn157 (amip; atmosphere-only). In this tutorial, we will use an updated version of u-bx616 that has been prepared especially for this training (u-cb905).

  1. Open a terminal and log into accessdev using your NCI credentials, and authenticate the GPG agent with your UKMO credentials (see Setting up for ACCESS-CM2 for troubleshooting):
    Note: if using a VDI, get connected to it first, and open a terminal inside.

    $ ssh -X USER_ID@accessdev.nci.org.au
    $ mosrs-auth    
    # you will be asked for UKMO credentials
  2. We can ‘checkout’ a Rose suite using Rosie, creating a local copy from the UKMO repository where all ACCESS-CM2 suites are stored:

    $ cd ~/roses    
    # this is the default location of all local suites
    $ rosie checkout u-cb905
    [INFO] u-cb905: local copy created at /home/599/$USER/roses/u-cb905
    $ cd u-cb905
    $ ls
    app/  meta/  ozone.rc  rose-suite.conf  rose-suite.info  suite.rc
    # app -- configuration file for the various tasks within the suite 
    # meta -- GUI metadata
    # rose-suite.conf -- the main suite configuration file
    # rose-suite.info -- suite information
    # suite.rc -- the Cylc control script (jinjia2 language)
  3. Checking out a local copy is good for tests and examining existing suites, but ultimately we will create a new suite in the UKMO repository that is backed up, and can checked out on other accounts; a full suite copy, rather than just a local copy:

    $ cd ~/roses
    $ rosie copy u-cb905    
    # this will open a vim instance where top-level suite descriptors can be set. We will use the default.
    $ :q <Enter>    
    # the vim exit command
    Copy "u-cb905/trunk@180955" to "u-?????"? [y or n (default)]
    $ y <Enter>    
    # to accept the rosie copy; a unqiue suite name will be generated (example below)
    [INFO] u-cb956: created at https://code.metoffice.gov.uk/svn/roses-u/c/b/9/5/6
    [INFO] u-cb956: copied items from u-cb905/trunk@18145
    [INFO] u-cb956: local copy created at /home/599/$USER/roses/u-cb956
    $ cd u-[suite_name]    
    # in my case this was u-cb956
    $ ls
    app/  meta/  ozone.rc  rose-suite.conf  rose-suite.info  suite.rc

Part II: The Rose GUI and top-level suite configuration

We will now explore the main suite settings, and common suite alterations. The are two ways to view suite settings and make changes: i) using the Rose GUI, and ii) directly viewing/editing the configuration files using your favourite text editor (In this tutorial I use less/nano to view/edit, but you can use whatever you prefer).

  1. Launch the Rose GUI from inside the relevant suite directory (e.g. /home/599/$USER/roses/u-[suite_name]) and inspect the suite information:

    $ rose edit &
    OR
    $ less rose-suite.info

    You will see the top-level descriptors from when we copied the suite, which can be changed at any time.

  2. Inspect the main settings, called the suite configuration:
    Note: Rose settings are separated into several categories and sub-categories (e.g. ‘Build and Run’, ‘Domain Decomposition → Atmosphere’), while the single configuration file contains all of the related settings.

    Rose: 
        i) expand the 'suite config' index (click the small arrow/triangle next to 'suite config' in the left-hand window)
       ii) navigate to 'suite config -> Build and Run'
           # this page contains the broadest suite settings, including switching on and off the major components of ACCESS-CM2, and the option for the automatic conversion of model output files into netCDF4 format. We will leave it with its default settings.
    OR
    $ less rose-suite.conf
  3. For our first suite alteration, we will change some Runtime options, which controls the Gadi resources being used:
    Note: if you are using an NCI project other than p66 for compute resources, this is controlled here.

    Rose:
        i) navigate to 'suite config -> Machine and Runtime Options'
       ii) set 'NCI core type' to "Broadwell"     
           # Broadwell is the old Raijin queue, and costs fewer SUs to use than Gadi's Cascade Lake
      iii) set 'NCI Queue' to "Express"    
           # the Express queue has shorter queue times, but costs additional SUs; recommended for short tests only
       iv) [optional] set 'Compute project' to your NCI project 
           # e.g. v45. The default is p66, the ACCESS-CM2 development project
        v) click 'Save' in the top toolbar under Edit (the typical "control + s" also works)
    OR
    $ nano rose-suite.conf
    # edit the following variables:
    CORE='broadwell'    
    NCI_QUEUE='express'
    [optional] PROJECT='[project]'    
    # e.g. v45; depending on the compute resources that you have permission to use
  4. Since we will only be running a test of this suite, we want it to only run for a short time. Therefore, we need to alter the cycling settings:
    Note: some restart files from bi889 (CMIP6 piControl) and bj594 (CMIP6 historical-r1i1p1f1) are stored in /g/data/access/projects/access/access-cm2/cmip6_restarts
    Note: you can perform a ‘warm restart’ (i.e. branch from an existing simulation) from any ACCESS-CM2 experiment, as long as you have all of the restart files for the desired year.
    Note: you can also perform a ‘cold restart’ where the model will be reconfigured and initialised using a default state (requires significant spin-up to reach equilibrium). We will only cover warm restarts in this tutorial.

    Rose:
        i) navigate to 'suite config -> Run Initialisation and Cycling'
       ii) set 'Total Run length' to "P2M"    
           # ACCESS-CM2 will now only run for two simulation months in total
      iii) set 'Cycling frequency' to "P1M"    
           # cylc will now perform any post-processing tasks and resubmit the suite to Gadi once every simulation month (you will thus get to see two full cycles in this example)
       iv) set 'Warm restart date' to "10500101"    
           # the suite will now use year 1050 of bi889 (our official CMIP6 piControl experiment) as its the initial state
        v) [optional] set 'Warm restart directory' to your directory containing the restart files of your parent simulation at the desired year    
           # the directory/filename structures must be the same as the default location (restart/atm/bi889a.da09500101_00, etc)
    	   # more on restart files in Part VII (next tutorial)
       vi) [optional] set 'Warm restart run ID' to the 5-character suite ID of your parent simulation    
           # e.g. bi889 (for suite u-bi889)
      vii) remember to save!
    OR
    $ nano rose-suite.conf
    # edit the following variables:
    RUNLEN='P2M'
    RESUB='P1M'
    WARM_RESTART_DATE='10500101'
    [optional] WARM_RESTART_DIR='[path/to/parent/restart/files]'
    [optional] WARM_RESTART_RUNID='[parent_suite_ID]'

Part III: Component configurations and science settings

Now that the top-level suite settings are the way we want, its time to get further into the component settings, primarily the UM component where most of the climate forcings and input files are defined, along with a great many other settings. We won’t make a lot of changes here, as they are heavily dependent on your use-case and science goals, but you will see where many changes can be made.

  1. Lets start with the UM:
    Note: $CMIP6_ANCILS = /g/data/access/TIDS/CMIP6_ANCIL/data/ancils

    Rose:
        i) expand the 'um' index (click the small arrow/triangle next to 'um', at the bottom of the left-hand window)
       ii) ?? navigate to 'um -> env -> Runtime Controls -> Atmosphere only
      iii) ?? [optional] set 'PRINT_STATUS' to "Extra diagnostic messages"    
           # this will help with debugging if necessary. Save!
       vi) ?? navigate to 'um -> namelist -> Top Level Model Control -> Model Domain and Timestep'    
           # here you can switch to a Single Column Model if your science goals require such a setup
        v) save!
    OR
    $ nano app/um/rose-app.conf
    # edit the following variables:
    [optional] PRINT_STATUS=PrStatus_Diag
    [optional] model_type=5
    # for switching to a Single Column Model, which is not part of this tutorial
    
  2. Now we are going to alter some atmospheric forcings:
    Note: for changes to time-varying gas MMR values, we suggest creating your list of values/years first, then copy/pasting the data into the configuration file directly, instead of one-by-one into the Rose GUI which can be quite tedious and error-prone.

    Rose:
        i) navigate to 'um -> namelist -> Reconfiguration and Ancillary Control -> Configure ancils and initialise dump fields -> 24d9c434'    
           # this is the Ozone field, which is provided through an input ancillary file
       ii) set 'ancilfilename' to "$CMIP6_ANCILS/n96e/timeslice_1850/OzoneConc/v1/mmro3_monthly_CMIP6_1850_N96_edited_ancil_2anc"    
           # in this experiment we will fix Ozone to 1850 levels. Save!
      iii) navigate to 'um -> namelist -> UM Science Settings -> Section 01 - 02 - Radiation -> Gas MMRs'    
           # here you can change & set MMR values for experiments with fixed gas concentrations (e.g. piControl)
       vi) navigate further down to 'Gas MMRs -> Varying gas MMRs'    
           # the 'l_clmchfcg' variable switches on/off time-varying gas MMRs
        v) navigate further down to 'Gas MMRs -> Varying gas MMRs -> Varying CO2 MMR'    
           # here you can alter the time-varying gas MMR settings (CO2 in this example)
           # 'clim_fcg_levls_co2' are the MMR values for each year; 'clim_fcg_years_co2' are the years respective to 'clim_fcg_levls_co2'
    OR
    $ nano app/um/rose-app.conf
    /24d9c434
    # to search the file; 'N' to cycle through search hits
    # edit the variable:
    ancilfilename='$CMIP6_ANCILS/n96e/timeslice_1850/OzoneConc/v1/mmro3_monthly_CMIP6_1850_N96_edited_ancil_2anc'
    # in this experiment we will fix Ozone to 1850 levels
    $ <Cntrl> + x; y; <Enter>
    # to save and exit nano
    $ less app/um/rose-app.conf
    $ /namelist:run_radiation
    # here you can change & set MMR values for experiments with fixed gas concentrations (e.g. piControl)
    $ /namelist:clmchfcg
    # here time-varying gas MMRs can be changed (or turned off via the variable 'l_clmchfcg')
    # 'clim_fcg_levls_co2' are the MMR values for each year, 'clim_fcg_years_co2' are the years respective to 'clim_fcg_levls_co2
  3. Our final change to UM settings is the STASH, which controls the fields that are saved to the model output, with many optional configurations:
    Note: the STASH can be changed by editing the configuration file, however most of the useful information about each field (including the name!) is only visible through Rose, therefore we strongly advise to use the Rose method to edit STASH requests.
    Note: for more detail on individual STASH fields, see https://reference.metoffice.gov.uk/um/stash

    Rose:
        i) navigate to 'um -> namelist -> Model Input and Output -> STASH Requests and Profiles -> STASH Requests'
           # changing the 'Group' setting (top left of the page, next to 'Filter') to "isec" sorts the fields into sections (i.e. STASH categories), making it much easier to navigate
           # the fields that visible on page are already loaded into the suite, and can be individually enabled/disabled by ticking/unticking the 'Incl?' box (the exclamation point next to the name also indicates deactivation)
       ii) open the 'Packages' menu in the top-right, select 'Package: HistExtra', and "Enable all". Save to see changes in STASH
           # output fields can be labelled with a 'Package' ID, allowing multiple fields to be enables/disabled collectively
           # the 'HistExtra' package includes several 3D daily and sub-daily fields that are required for CMIP6. Enabling these will greatly increase the size of the model output data, and are not necessary for most experiments
      iii) open the 'New +' window (next to Packages), expand section '0: Primary fields', double-click item 24 'SURFACE TEMPERATURE AFTER TIMESTEP', select OK, and close the 'New' window
           # this will add an additional surface temperature field to the STASH list, however first you must provide Rose with details about how we want the field managed
       iv) find the new entry in the STASH list
           # look for the line with red crosses against it, and an index value '1'
        v) set 'dom_name' to "DIAG", 'tim_name' to "T6HR", 'use_name' to "UP7"
           # this tells the suite to save this 2D field (DIAG) at 6-hourly intervals (T6HR) to the 6-hourly output stream (UP7), 
           # field 0,24 is now in the STASH 3 times: monthly means (TDMPNM), daily means (TDAYM), and 6-hourly instantaneous values (T6HR)
       vi) save!
  4. A number of macros can be accessed through the STASH Requests page, designed to validate the current STASH request, and attempt to correct identified issues.
    Note: The results must be taken with a grain of salt, and many errors/warnings given by these macros can be safely ignored, but the macros can be very useful in identifying potential issues prior to running a suite.

    Rose:
        i) On the STASH Requests page, open the 'Macros' menus (top left corner of the page, under the 'STASH Requests' tab and above the 'Group' selection
           # there are 7 macros in total, in 2 categories:
           # 4x '...Validate' macros (symbol: '?' in a blue diamond) are used to check various aspects of the STASH Request
           # 2x 'TidyStashTransform...' and 1x '...RemoveUnused' (symbol: blue pyramid -> red sphere) will attempt to rectify potential issues
       ii) Run the 4 'Validate' macros one-by-one
           # the only macro to flag any issues should be 'TidyStashValidate', with two errors:
           # Error: namelist:streq(1) -- 'Wrong index: 1 should be 00024_...' 
           # this is a simple metadata issue that is always introduced when adding new fields, and can easily be fixed with another macro
           # Error: namelist:time(t3hmn_039ecafe) -- 'Identical sections: namelist:time(...'
           # we will not worry about this error; identical sections are usually ok and in this case is expected due to our specific setup
      iii) Run the 'TidyStashTransform' macro, and select 'Apply'
           # this will correct the metadata of our new field (6-hourly surface temperature), adding the correct 'Index' string
       iv) save!
           # you may be left with some red error symbols ('!' in red triangles) from the Validation macros
           # you can refresh the Rose GUI with the 'Metadata -> Refresh Metadata' option from the main toolbar at the top
  5. We can also explore the name-lists, and many settings, of the other main components of ACCESS-CM2: MOM5 (ocean), CICE5 (sea-ice) and OASIS3-MCT (coupler):
    Note: we will not be adjusting any of these settings in this tutorial.

    Rose:
        i) navigate to 'mom -> namelist -> auscom_ice_nml'
           # on this page and the *_nml pages under it, you can see and alter some of the ways in which MOM behaves and couples to the UM
           # output fields from MOM are not controlled through the Rose suite, but by an input file called the 'DIAG_TABLE'; we do not cover this here
       ii) navigate to 'cice -> namelist -> icefields_nml'
           # this is the CICE version of the STASH, where diagnostic fields designated 'm' are saved monthly, 'd' are saved daily, and 'x' are not saved
           # items listed as .true. or .false. contain grid-level information and should not be altered
           # other CICE behaviours can be controlled through the other pages within the cice namelist
      iii) navigate to 'coupled -> file'
           # the OASIS3_MCT coupler is controlled by this list of input files, which can be changed here
    OR
    $ less app/mom/rose-app.conf 
    $ /namelist:auscom_ice_nml
    # here you can see and alter some of the ways in which MOM behaves and couples to the UM
    $ less app/cice/rose-app.conf
    $ /namelist:icefields_nml
    # diagnostic fields designated 'm' are saved monthly, 'd' are saved daily, and 'x' are not saved
    $ less app/coupled/rose-app.conf
    # the OASIS3_MCT coupler is controlled by this list of input files, which can be changed here

Part IV: Run suite & auxiliary Cylc tasks

With our suite modifications complete, we are now ready to run a test of our suite, and see if any errors arise.

  1. The Rose package is typically used to install and initiate the running of a suite:
    Note: ‘installing’ consists of several small tasks, including preparing the necessary output directories (most notably cylc-run which we will explore in the next tutorial), preparing environments, and loading suite settings into Cylc.

    $ rose suite-run
    [INFO] export CYLC_VERSION=7.8.3
    ...
    [INFO] install: suite.rc
    [INFO] REGISTERED u-cb956 -> /home/599/USER_ID/cylc-run/u-cb956
    ...
    # You will now see the Cylc GUI, which will run its predetermined series of tasks, according to the main Cylc script (suite.rc)
    # Some of these are simple tasks that are performed locally on accessdev (e.g. checking out code repositories), but most are submitted to the PBS job queue on Gadi
  2. There are many auxiliary tasks performed by the ACCESS-CM2 suite during the first cycle and prior to the coupled model running:
    1. fcm_make(2)_um: controlled by the BUILD_UM switch in the suite configuration, the UM vn10.6 code is checked out from the UKMO repository, and compiled within a PBS job. The specific repository branch used is defined in the fcm_make_um configuration file, which can also be accessed from the Rose GUI.
    2. fcm_make(2)_drivers: controlled by the BUILD_DRIVERS switch in the suite configuration, this task checks out a set of scripts (including the MOM & CICE compilers, netCDF post-processing, warm restart, and many others) from the NCI svn repository for use throughout the model run (https://trac.nci.org.au/svn/access_tools/access-cm2-drivers/trunk/)
    3. make(2)_cice: controlled by the BUILD_CICE switch in the suite configuration, CICE is checked out from the NCI svn repository, and compiled in a PBS job.
    4. make(2)_mom: controlled by the BUILD_MOM switch in the suite configuration, MOM is checked out from the NOAA-GFDL Github repository, and compiled in a PBS job.
    5. install_ancil: a reference file on Gadi (~access/data/ancil/access_cm2_n96e/O1/ancils_GA7.1_PD) is sourced, providing the paths to many input ancillary files.
    6. install_warm: controlled by the WARM_RESTART switch in the suite configuration, the shell script ‘warm_restart.sh’ (from fcm_make_drivers) is run, which loads the initial climatological state of the experiment, which we set in Part II, task 4.
  3. Similarly, there are a few tasks performed in each cycle, after the coupled model is run:
    1. filemove: copies restart files and UM history files (model output data) to the $ARCHIVEDIR (default: /scratch/[project]/[USER_ID]/archive/[suite_name]).
    2. housekeep: initiates the ‘rose_prune’ script (tarring log files & other cylc-run directory cleaning) on jobs 3 cycles previous (where applicable).
    3. history_postprocess: post-processing and compression of ocean and sea-ice model output data.
    4. netcdf_conversion: post-processing of atmospheric model output data from UM-based ‘.pp files’ (UKMO-proprietary) into netCDF4 format.
    5. retrieve/redistribute ozone: Ozone data from the previous cycle is retrieved and re-applied to the model. This is due to a UM error in the way the ozone field is passed between cycles.
  4. These auxiliary tasks are of course defined within the suite. Some of these can be viewed and altered through the Rose GUI, however some settings (e.g. the MOM/CICE compiler branches) are hard-coded into the Cylc suite script (suite.rc), which is written in Jinja2, a templating language for Python. We will not discuss the Cylc scripts any further, however if you desire, you can view the suite.rc file in the top suite directory, read about Cylc, and follow the Cylc tutorial.
  5. By now, your ACCESS-CM2 should have failed. The task ‘coupled‘ is red, its state listed as ‘failed’, and you will likely have an email in your inbox with a few details of where in the suite the failure occured.
    DO NOT WORRY! This is because of purposeful typos inserted into these instructions, and will be resolved shortly.
  6. In the next tutorial, we will learn where to look for suite and error logs, some basic suite debugging, how to get your job back on track with more rose and cylc commands, and finally examine the model output data from our run.
    Next tutorial: