Meeting notes 2023

Notes from regular and irregular meetings in reverse chronological order.

Items to follow up:

AWAP/AGCD data requests – rr5, zv2 (official but research only), rr8 Seasonal prediction – Morwena, rr7 – Aurel. Also BAWAP on CSIRO server. ua6 ~~/g/data1a/ua6/~~/g/data/xv83/from_ua6/processed/staging/users/cxh599/GlobalObs_and_Reanalysis/processed/native/AWAP/mon/pr/pr_AWAP_mon_native_1900.nc. CSIRO climate data portal is in preparation also supporting a commercially licensed AWAP product (but not at NCI). zarred copy from commercially licenced dataset at CSIRO in xv83. Requested a new NCI project for CSIRO researchers copy of AGCD.

ua6 decommissioning – process, other storage availability, quotas etc.
Trenham, Claire (Environment, Black Mountain) to revisit meeting invitee list and contact people about joining.
Backup strategy for published data at NCI (CMIP5, CMIP6, etc)

14 Dec 2023 climate data weekly meeting 34 2023

Paola, Thomas, Claire. Apologies: Chloe

Paola battling libraries
ILAMB/ESMValTool training (NRI)
- Great that they got the resources together
- Hard to pitch it right to user needs
- Enough to get started using the tool
- A confusing point in doco was raised and addressed
- Not time for active demos
- Ensure no accidental messaging that implies there’s ARE vs ESMValTool
- Power in community agreement for how to calculate the various metrics, so everyone uses the same algorithm
- Still not easy to bring new datasets (e.g. CCAM, BARPA)
CMIP7 hackathon in Aspendale next year
- It’s not “exploring” CMIP7, that doesn’t exist yet, it’s looking at how to work with the data when it comes and what is needed.
- ESMValTool hackathon
- ESMValTool enables reproduction of IPCC plots
  - But scientists need more flexibility and to be designing the analyses they need for CMIP7.
- Hopefully that is the goal of the workshop?
Inline analysis
- Rhaeger was looking at how to go straight from ACCESS output to ESMValTool – but need to post-process – MOPPER, ACCESS Archiver
- Growing understanding of data workflows/processes.
NRI to support ACCESS publication
- But staff still getting up to speed with post-processing tools
- Paola will push ahead and publish AUS2200 processed with MOPPER
- Post-processing is never just push a button, need to understand the configuration and variable calcs may be different to CMIP6, mapping may need to change, etc.

30 Nov 2023 climate data weekly meeting 33 2023

Paola, Chloe, Alicia, Claire, Hannes, Thomas.

CLEX workshop
- workshop was last week in Brisbane.
- Romain, Andy and others from NRI were there
- Still some uncertainty about data after CLEX ends
- CI’s meeting – very supportive of Paola’s CMS pitch/work
- New CMS team will be down to 3 people – Melbourne, ANU and UNSW, called “CDMS”? Don’t know who, people will have to reapply for fewer jobs. Melbourne Uni position will go.
  - Expectation that the unis support the data and software
- Reassuring postdocs and students they’ll be okay but not admin and technical staff
NRI
- Can request additional funds each year on top of 5-yearly funds
- Have received funding for a new team to explore new software
- Possibly looking at Machine Learning
- No data term
- Expectation is that data would be done by the model evaluation team
  - May not be the best fit for that team on top of their existing work
- Really need a data team somewhere
- 53TB of AUS2200 data produced and ready for NRI storage
- Use of DOIs – needs to be different for each dataset
- Unrealistic expectations around Aus2200 handover and doco?
- COSIMA code and data standards not necessarily aligned with wider community expectations/standards
- Growing understanding of importance of data curation – but not yet at the point of funding it
- Waiting for NRI data custodian??
  - Claire C taking care of the storage but that’s more a quota-ing thing
- Community advocacy for data curation/custodianship needs might help
- Romain remains our point of truth
- Newsletter – https://www.access-nri.org.au/2023-research-infrastructure-investment-plan-funding-for-access-modelling-announced/
  - New software team, and storage funding?
  - possible coastal team (???? but no engagement with any of the coastal modelling/infrastructure community I know?)
ERA5
- What post-processing does NCI do?
  - None – so interested to know about chunking issues.
  - Claire to connect Hannes with Matt Chamberlain
- What is the workflow that triggers new data download?
  - Happens automatically but sometimes slowdowns delay it for months – unreliable servers with bad bandwidth
  - If something is urgent, query Syazwan
- Any interest is model levels?
  - Not sufficient across the community
Seminars/workshops
- Seminar to demo the post-processor tool (ACCESS-Mopper)
  - Worked for AUS2200
  - Still need help on the ocean side
- CLEX growing interest in ML
- Nectar have a new service to help people scale GPU ML workflows
  - Upcoming workshops to train people in scaling ML workflows
  - NCI also have training in them
  - https://docs.mlerp.cloud.edu.au
- NRI training next week in iLAMB and ESMValTool (Thurs & Fri)
  - Cancel this meeting next week in favour of iLAMB training
Funding schemes don’t match reality of data storage needs
Observation: There’s not as much desire to be in data as much as science, maybe the funding and job availability reflects that even if it’s not realistic of need…

23 Nov 2023 climate data weekly meeting 32 2023

Thomas, Claire. Apologies: Paola, Chloe, Hannes

LLMs
- Thomas using Notion to help code – feed the LLM python documentation that’s relevant, ask questions, over time it builds its knowledge spanning both what it already knows from the internet with the specifics you’ve fed it to improve its answers when you ask coding questions
- Ben L has been doing a similar thing of feeding an LLM model manuals and then making queries about model configuration files.
Paola and Hannes both at workshops today.

16 Nov 2023 climate data weekly meeting 31 2023

Chloe, Thomas, Hannes, Claire?. Apologies: Paola

Workshops
- Thomas attended NCI Dask-ML workshop
- Found it helpful
- Technical stuff – need people signed up for projects before it starts. Limited by the slowest person, and approval wait time..
- Use Zoom breakout rooms for those who are behind instead of holding the class up (e.g. Carpentries)
NCI reference data
- Onboarding new large climate datasets
- Often badly chunked(?)
- Thomas – chunking is literally the hardest thing to sort out (Pangeo!)
- Discovery, exploration – hard to get the balance between spatial, temporal chunking, finding the balance, lots and lots of files to open
- Make temporary analysis-ready datasets, don’t alter the authoritative collections
  - ARD: clean data (fix standards), make chunking consistent across data and rechunk for analysis
  - E.g. BRAN2020 – save as 2 different versions on top of the authoritative – once for space, once for time. Takes a few hours to rechunk and write to zarr , but then it’s there for the team – use /scratch .
  - Boutique things – chunking and use of flox (with xarray ) and zarr . Get things from 50min down to 2min.
  - It’s fine to be on /scratch , can recreate in a few months if needed – may need more scratch space than default but it’s still more cost effective to take this approach
  - No need for NCI to hold two versions of the dataset for both chunking approaches
- Is it worth having an ‘optimal’ chunking that balances time and space?
- There is no optimal chunking (Dougie presentation from a few years ago?)
- Memory limitations
- BoM are trying to deliver a lot of data with limited data stewardship resources
- Awareness of chunking issues, there’s only so much we can do though.
- Trying to move into a direction where NCI can “fix” some of the problematic datasets – tools to have the capability to fix things so the contributors don’t have to.
- The users can fix the data, what we want to avoid is multiple users creating the same intermediate data – important to facilitate comms so that they’re all aware of issues and intermediate data products
- Open science is still a “new thing” – younger scientists will increasingly learn that careers won’t be advanced by being private in their work – there’s more value in releasing the code/data.
  - e.g. how do the bureau create their lagged ensemble, specifically?
- Want a platform where users can share strengths, limitations, required corrections – e.g. OneClimate comments? other? risks to making this info semi-public to organisations but improved science for researchers.
  - NEED A CODE OF CONDUCT
  - Limit it – e.g. only NCI/registered users can see and interact. Secure.
  - Be conscious of how to orgs cover their butts in letting us have these converstations. Moderation?
- Maybe nothing more than discussion is needed, don’t need additional copies of data? See what emerges.

09 Nov 2023 climate data weekly meeting 30 2023

Claire, Hannes, Thomas. Apologies: Paola, Chloe

ua6 clean up
- synda_test good to be deleted, hard to know how to get appropriate approval to remove it.
Career options discussions
- Is there a career in data? Kind of, but I do mostly science. Library services are dedicated data/metadata specialists.
- Hannes is interested in machine readability
  - PIDs, data context
  - Metadata standards
- If Hannes was in charge… – despite interest in ML, would probably put less effort in that and more in data rescue
  - Especially historical and even more recent Antarctic observational data.
NCI structures
- “live” page is manually updated
- Systems team don’t control that page, NCI comms aren’t our problem but good to know that different teams are responsible for different parts of the system
To discuss next week:
- Backup of published data at NCI (CMIP specifically)
- Policy – should we back up massive datasets if they’re published even if it’s expensive?
- Could copy to tape archive but it’s big enough that that’s a significant investment, how do we fund it?
- The idea that it’s cheaper to reproduce stops being appropriate when systems change and it’s no longer possible to recreate with the old versions etc

02 Nov 2023 climate data weekly meeting 29 2023

Paola, Hannes, Claire, Thomas. Apologies: Chloe, Gen

Hannes is a data manager at NCI and part of the data services/data publishing team
- Emerging interest in vocabularies
ua6 queries
- synda_test directory – can be deleted
- 20th Century Reanlaysis (older versions) not being used so removed from ua8
- Make space for LME data that Claire backed up to xv83 but we’ll put it in ua8 for common access
Data packing
- Talk to BoM about packing data to reduce resolution of variables (reduce precision)
- To be done with CORDEX BARPA data
- Being done with ERA5
  - Should be lossless, reduce file size by 50%
  - Shuffle – data should be pretty contiguous so can leverage this to further reduce file size
  - Compression will help too
- Internal compression can make a lot of difference, especially when converting GRIB to netCDF e.g.
- CMIP5/6 require no compression and double precision in their standards – being revisited
  - CMOR can compress!
CF convention, CVs
- Standard names are a controlled vocabulary that’s part of the CF convention but it’s about the only thing in CF that is controlled by a vocabulary.
- There are other fields that have CVs, e.g. cf_role, cell_methods, coverage_type (but these are more ACDD than CF)
- CMIP provides additional vocabularies, e.g. frequency tables
- Mappings from ERA variable names to CMIP – bespoke
  - Need to standardise these across the community, align with NRI work too
  - https://github.com/ACDguide/ClimMappings
- ACDGuide Guidelines book – Introduction — Climate Data Guidelines (acdguide.github.io)
  - FYI – metadata portal to search across orgs – https://oneclimate.acdguide.cloud.edu.au

19 Oct 2023 climate data weekly meeting 28 2023

Claire, Paola, Thomas. Apologies: Chloe

ACCESS MOPper
- Paola and Sam making developments
- Code standards are something to aspire to
- Use complicated design so it can be flexible across CMIP and other experiments
- NRI still seem to be trying to use APP4, need collaboration to move from APP to MOPper
- CMOR is a pain but need to use it for CMIP submission to ensure standards
- Ben Schroeter pushing use of axiom but probably not fit for purpose in this context, and doesn’t leverage CMOR directly.
Gadi outage 18/10
- Gadi was inaccessible to gdata1a and gdata5-affiliated users yesterday
- No updates from NCI
- Live status eventually showed the gdata outages but no indication that logins or jobs were affected
- Help tickets said there was a problem and it was with storage team but no feedback about lack of live status monitoring
- Thomas will escalate with Steve
CMIP7
- Oceanographers think there is an announcement coming?
- NRI to work on CMIP7 officially probably?
- Thomas, Claire and Michael G have provided feedback to the research director of the ACCESS team illustrating downstream use and impacts, this will be taken to the business unit leadership to ask for a formal commitment to CMIP7.
- Chloe leading the data request working group at the international level
Ethics approval for Enabling access report
- In theory we need to contact every workshop participant, but we didn’t use their contributions directly so seems more sensible to interpret this as the use case providers.
- However there’s a real concern that this means we have to consider all workshops to be ‘human research’, can we give feedback about this?
- Should have been more explicit in seeking consent in the start, but it’s still unclear whether we need ethics approval every time we have a workshop on any topic where we’ll ask for opinions?

12 Oct 2023 climate data weekly meeting 27 2023

Romain, Claire, Sam, Paola, Thomas. Apologies: Chloe

NRI update
- Post on hive trying to identify key datasets people want supported (very broad)
  - https://forum.access-hive.org.au/t/reference-datasets-needs-fy23-24/1417 (and asked in each NRI working group)
- Still trying to figure out what NRI storage needs are
- Have created NCI data collections for obs datasets to support iLAMB and ESMValTool (ct11)
- Some datasets have very restrictive licences so can’t have everything
- What datasets should NRI being managing?
- Who is to manage ERA5 going forward and how? If NRI are to look after it it will use all their storage, but then where does that leave the Clex LIEF grant?
- Plan to join ESMValTool consortium, including discussion of what reference datasets are needed, and what needs CMORising
  - Work with training and outreach team regarding CMORising datasets
  - What needs to happen with CMORising, data directories (DRS) and also filenames, structures etc?
  - There’s a bit of flexibility in the DRS now, but you do need to rewrite the files to fix metadata (variable names etc) and potentially change filenames.
  - Want to avoid two copies of the data – use native output and do “live CMORisation” to add required metadata on the fly but it’s far from operational.
  - ACCESS tool could be adapted to things other than ACCESS potentially, could make configuration files for any data to rewrite.
  - Romain trying to avoid reinventing the wheel! Start with APP4 or MOPPer? Work with Paola and Sam.
ACCESS MOPper
- A few stages – configuration file for mapping variables, database of mappings, tool to run conversion (extract, cmorise)
- Might be valuable to separate mappings into a different tool that could also underpin other things like ESM conversion
- Trying to work out details for a module.
- Tool creates a list of variables in output file, map them to standard variable names, give model dimensions, frequency, type (e.g. point), positive dim if needed, table, model version, data type (float32), size, variable names, variables to use in derived values.
- The tool ultimately can output whatever you want – new netCDF, data cube, xr dataArray etc.
- Paola walked us through the process of setting up and running the tool.
Pangeo
- Thomas attended Pangeo meeting this morning
- Talk from NOAA about pulling data from around the place and combining
- Should there be a standard or advice on chunking (e.g. for CMIP7)
- CMOR can’t handle (define) chunking.
- Should chunking be added to the CMIP7 guidance? (this question may be of interested to Mackallah, Chloe (She / Her) (Environment, Aspendale) from a data request perspective)
- Something the global community is talking about. Julius Busecke will present on this soon.
- If you know the chunking ahead of time does it help – currently we end up storing multiple zarrs of datasets on /scratch to permit different chunkings for different researchers
- Trying to work it out would make CMOR less rigid.
- CMOR pre-processes but the PREPARE tool is what actually checks that all standards are followed (in ESGF publication stream)
ua6 cleanup
- Hannes looking to finish ua6 cleanup and decommissioning
- Claire had copied the data from ua6 to xv83
- Paola had downloaded additional data to ua8
- Combine all data in ua8 at least temporarily (after retiring C20C), work out if researchers still need it to determine if it can live in xv83 or should move to ia39 for broader access?
CMIP7
- ACCESS (CSIRO CAIO programme) looking for impact stories from CMIP6 ACCESS models

05 Oct 2023 climate data weekly meeting 26 2023

Claire, Paola. Apologies: Romain, Chloe, Thomas

NRI engagement/future plans
- Romain was going to attend today but had to push back to next week
- ACCESS MOPper
- ESMValTool
- ACCESS Hive is hard to monitor, rely on digests but sometimes people should be involved in conversations maybe don’t know they’re happening.
- CABLE redevelopment is very active but maybe not everyone in the community is being heard or involved.
IMAS datasets
- They’re not exactly Clex researchers so not entitled to access Clex resources at NCI
- How to share data other than via IMAS GeoNetwork/THREDDS? How to get storage on NCI/Pawsey?
- ACCESS 1/30 OM – produced at Pawsey but no available storage at NCI through CLEX (John R’s scheme)
- LIEF grant is full and the data storage wasn’t necessarily well managed so need to recover what can be – consider what needs to persist and what can be retired?
- Need a stronger culture around cleaning up data at project end or before leaving a job.
ACDGuide – Governance
- Paola did a big edit of the Create section. Reviewed and can be finalised at next meeting.
- Improve discussion of tools and tips for data creation/metadata editing.

07 Sep 2023 climate data weekly meeting 25 2023

Chloe, Thomas, Claire. Apologies: Gen, Paola

CMIP7
- Julie Arblaster and Rachel Law spoke at the ACCESS-NRI workshop
- CSIRO apparently will contribute “some” strategic FTE to work on CMIP7, with NRI leading
  - But NRI don’t run the models
  - NRI provide model evaluation tools but don’t have the expertise to do model tuning
  - Data mgmt is outside NRI scope
  - CSIRO will have to provide hands on effort, not only the NRI secondments
- NRI workshop was hit-and-miss for hybrid – Claire and Thomas noticed positive experiences, Chloe had a bad time in the atmos/cmip7 breakout where online people were largely ignored
- Chloe: Data Request and how CMIP7 planning works
  - Various teams, central office to help comms between each area.
  - 3 levels: core variables (required for ‘all’ MIPs, comes from WCRP level), harmonised variables (community-driven standard vars in 6 themes, managed at WIP level), and unharmonised variables (individual MIPs)
- Ability of young and emerging researchers to contribute
  - “New eyes on CMIP” is biggest CMIP7 working group!
  - Pivot to open, we can be open and still protect our IP (Netflix and AirBnB manage!), things can be reasonably locked down but still visible
  - Inline model evaluation during runs, visible online would be ideal to help model verification and tuning
  - Pangeo model – goal isn’t career advancement it’s problem solving

31 Aug 2023 climate data weekly meeting 24 2023

Chloe, Paola, Thomas, Claire. Apologies: Gen

CMIP7
- Chloe busy with data request WG
- Unclear yet if Australia will be contributing
  - Andy Pitman pushing hard for an Australian contribution to happen
  - Needed for best quality data for Australia
  - Trying to put together a group to put in a model – can’t just be ACCESS-NRI.
  - CSIRO not yet committed either way?
  - Government may perceive reputational damage if we don’t
  - Lots of assumptions that it’ll just happen?
- There is an argument that CMIP7 isn’t even needed
  - No new physics, large compute (=> environmental) cost
  - Focus on regional modelling?
  - But we still need involvement!!
- What do we need to do to champion our needs?
- CMIP7 looking to enable a range of models including ML models
- Ensure models remain intercomparable as model complexity grows
- Mackallah, Chloe (She / Her) (Environment, Aspendale) to present current status of CMIP7 data request
Statistical investigation of models
- Terry O’Kane, Mark Collier etc., using Bayesian techniques on CMIP6 data to investigate ENSO and other cycles to look at whether the models are representing appropriate frequency and intensity to provide feedback to model developers
ACCESS-NRI workshop
- Next week – training day Monday (in person)
- Main workshop can be attended remotely Tues-Wed ACCESS Community Workshop 2023 Program – ACCESS-NRI
  - Claire will go to high res modelling breakout, unclear if we’ll have representation in the ESM/CMIP7 breakout
- Working groups day Thursday
  - Paola will attend but Clex main focus is more on regional modelling and machine learning

24 Aug 2023 climate data weekly meeting 23 2023

Claire, Thomas, Paola. Apologies: Gen

Data backups for published data
- NCI gdata is not backed up, researchers are responsible for making their own archival copies
  - Unknown status for ACCESS-ESM – Tilo and Matt C are both not aware of anything, which means there may be some risk there (also for BRAN2020?)
  - Paola archiving data at the moment to MDSS
  - Paola is archiving processed data, raw data to MDSS to free up space, and making a copy of the backup copy
  - Should CSIRO formally decide whether to archive the contents of fs38 to /datastore? (Mackallah, Chloe (She / Her) (Environment, Aspendale) to consider from data stewardship perspective?)
- Freeing up data storage to run post-processor for AUS2200
  - CMOR3 is confusing! C and Fortran being called directly instead of python wrappers.
  - Difficult also when you’re mixing point and mean time sampling of variables (cell methods)
NCI connections unstable – regular broken pipes and ARE connection problems
- ARE dask instabilities?
- Sam tested and couldn’t reliably reproduce
- Dale was experimenting with default cluster settings in hh5 , might be deployed if successful
ACDGuide WGs
- Busy with other things but will get back to governance and OneClimate soon
ACCESS-NRI workshop
- Not attending training due to lack of clarity over schedule and speakers
- Interested in Romain’s work on ESMValTool
- Check with Romain that there’s no unnecessary download duplication, maybe can leverage ia39 copy if it is re-downloading the same data, but not if it’s different data, also is re-processing happening on the fly or are there two copies?
- Clarify roles of NRI and CMS for things like Grafana in decommissioning accessdev

10 Aug 2023 climate data weekly meeting 22 2023

Paola, Thomas, Claire.

Data retirements
- Paola decommissioning ua8
- Archiving to MDSS
- CMIP data not being backed up by NCI
  - does CSIRO hold an archive copy of ACCESS?
  - Maybe Mackallah, Chloe (She / Her) (Environment, Aspendale) would know if there is a backup on MDSS or /datastore ?
  - Moore, Thomas (Environment, Hobart) to check with Tilo if he gets an opportunity re ACCESS-ESM1.5
- Does anyone use the 20th Century Reanalysis?
  - Claire to check with Kathy
  - Need to free up space, that’s 80TB that could be released, and it’s too much to archive – move to ia39 MDSS if needed?
- LIEF grant storage management
  - Get people to estimate how much they need and clear up after use
- Jason Evans’ CORDEX data still taking up storage space (not in an ESGF project??)
Github Actions to replace accessdev Jenkins
- Working to get data
- Not yet committing repos back to remote automatically – need token?
- Request service user so we don’t have to use Paola’s username and ssh keys
OneClimate catalogue
- No further work done but lots of records ingested ready to be allocated and tidied
- Claire identified a list of additional records to work on
- Confusion between “location” and “access information” (the latter being terms of use)
- Paola has added records for the data stored in ia39 (barring OISST and HadISST), need to check the data from ua8 as well
Enabling Access
- We can make recommendations but end of the day people can’t run their models on Pawsey if their forcing data is on NCI

03 Aug 2023 climate data weekly meeting 21 2023

Paola, Thomas, Chloe, Claire.

xMIP
- Paola put together a modified tutorial that works with our intake catalogues
- Taimoor trying to use it on Gadi, contact via ACCESS-Hive
- Add a category in Intake catalogue flagging that preprocessing is needed – but unclear how many people are using it
- NRI promoting their own catalogues which are built on NCI’s not hh5’s, and less flexible so won’t have these fields.
- A lack of dedicated data managers who really take responsibility for catalogues and functionality, but conversations/decisions are happening at the senior level not between technical staff.
- Intake catalogue building is not trivial, does require effort. E.g. sometimes preprocessors are needed even for well structured datasets like BRAN2020 and it’s not obvious, needs to be documented.
- Datasets with very many files (daily or monthly instead of annual, decadal) add a heap of dask overhead, catalogues are slow – ideally data would be concatenated on the filesystem but NCI want to just host a direct replica.
- COSIMA Cookbook virtually impossible to update now too for the same reason – database not well structured to manage this, hopefully move to Intake at some point?.
Model stability
- How do you determine when the models are stable particularly for all biogeochem variables?
- Mostly models assume spin up done when sst is stable
- These discussions still happening for ACCESS toward CMIP7
- Andy Pitman raised the need to promote CMIP7 participation in government level to fund it.
- There’s a lot of assumptions that someone else (unis, NRI, CSIRO) will somehow make it happen, really need community coordination and funding
ACCESS post processor
- Sam and Paola putting in many hours because they see it as useful but the effort is really disproportionate.
- Sam using it now for CM2 runs.
- The hard bit is the variable mapping. Paola built a database of ACCESS variables and what they correspond to in CMIP. Two mappings – one shared in a database, one passed where you add information like filesize, grid etc that are specific to the user. This is important for AUS2200 but consistent between runs. You have your own template which is used preferentially to existing mappings if set.
- Resampling. Sometimes stash code changes for the same variable and frequency.
- Run through modified CMOR tables
- For a new user, start from CM2 or ESM2 and then build on that for specific run.
- Two tables: CMOR variables (multiple resolutions), custom mapping.
- A lot of effort put in for AUS2200. Files with up to 7 different time axes! Adapting to new models is always hard. Might impose directory structures. Need to add deflate level for netCDF in the CMOR table, it’s done per variable! (same with shuffle)
- Paola working on documentation, might have a meeting for interested parties?
Single access WG
- Focus on getting additional records that are on Clex wiki – MERRA2 and other reanalyses
- Discuss how to manage datasets that are multiple data types

27 Jul 2023 climate data weekly meeting 20 2023

Paola, Claire. Apologies: Chloe

Big data WG could use attention.
- Restructure done but still a bunch of different approaches, should it have more details in terms of practical guidance.
- Need some content on ML
- Link in recent CMS blogs on dask etc – pre-processing, extracting regions from shapefiles, some that are Gadi focussed but may be relevant.
Enabling Cross-Institutional access WG
- Claire, Paola and Ian to discuss status of final summary report
- Ready to circulate?
Data Governance WG
- Transition leadership from Chloe to Paola once Enabling access is off our plates
- Paola has put in PR
- Still need creating datasets but doesn’t need to be too thorough, more important to get something out.
Single Access WG
- 90 records currently published
- There are a few cases of publications that would be best represented by a single entry pointing to NCI or CSIRO DAP, e.g. RV Investigator voyages.
- Claire to create a demo record for RV Investigator
New CLEX datasets published through NCI
- AWAP indices
- ACCESS-OM run for IAMIP (Ice algae ocean model intercomparison)
  - Previously published forcing, now publishing historical and maybe other runs
Automated processes: Substitute Jenkins so we’re ready for accessdev removal
- ACS reference data downloads move to GitHub actions in ACS repo that execute on Gadi login
- Github manages secrets quite securely but the action is tied to Paola’s ident which isn’t ideal.
- Need to revisit MERRA2 collection?
ACCESS-NRI workshop
- Training day on Sept 4, unclear what actual content is – e.g. small item for ESMValTool despite all Romain’s efforts, so likely to not cover much?
- One line might be a lot or a little, hard to tell? Don’t need time spent on ARE and climate environments compared to how to use the UM and ESMValTool
ESMValTool
- Romain very active getting recipes working at the moment
CWS-help
- Still unclear what is to happen, NRI’s position is the helpdesk should close, but need community discussion around this.

13 Jul 2023 climate data weekly meeting 19 2023

Claire, Paola. Apologies: Chloe

ACDG WGs
- No guidelines meeting for ages but there’s one next week
- No Big Data for ages, Paige on holiday
- Enabling access report coming together
- Publishing metadata records to the single access catalogue, need reminders!
ACCESS post-processor
- CMOR step is often oddly slow – why?
- Code is python but C under the hood so should be fast
- Capture failures -segfaulting?
  - Try a model level only extracting
- Tracking ID coming out bad
- Most are writing in just a few seconds but just some are slow and bad.
- MOPPER is the name of the new tool
- Paola to write an explanation of what CMOR does, what it’s for, limitations of CMIP vs other runs, what you need to provide etc.
New datasets to publish
- Indices based on AWAP-AGCD
- Will also do CORDEX
- YA-MIP (Antarctic stuff run with ACCESS-OM)
- John Reilly MOM6 1/30 degree publication plan?
CMS wind up
- Need to decommission jenkins
- No clear plan for data and governance
  - importance of the ACDG Guidelines book!

06 Jul 2023 climate data weekly meeting 18 2023

Romain, Paola, Chloe, Claire

Vocab mappings
- Paola creating tables to map ACCESS outputs for new version of APP
- Set up a repo for vocab mappings https://github.com/ACDguide/ClimMappings
  - Includes ACCESS, ERA/ECMWF, ETCCDI – open to additional contributions
  - Includes the vocab mappings used in Invenio (metadata portal for climate data unifying NCI, CSIRO DAP, AODN etc) – utilises existing mappings from NASA etc where possible.
    - Need to note sources of each vocab
- Romain working on an ERA5-CMOR mapping for ESMValTool
- You can make new vocabs that aren’t CMIP6-based (CMIP7 planning to be more flexible/extensible), but when calling CMOR you still have to call the vocab cmip6_cv !? Even for other known datasets like obs4mips. Bug in CMOR?
- Chloe active in working on CMIP7 particularly the data request WG
- Hopefully in CMIP7, decouple CMOR from PREPARE – so CMOR should be more flexible, less hard-coding, and any aberrations corrected at the PREPARE step for CMIP publishing.
- APP – working on variables, some need to check with community to ensure specific things are being calculated correctly. Some things are hard-coded for specific grids. Methods for selecting – nearest-neighbour not appropriate near land e.g. for ocean variables.
- Idea for name for new for new tools is “mopper” (ACCESS-MOPPer)
ACCESS
- One issue is there are so many different versions and implementations of ACCESS – UM-based (BoM), ACCESS-OM, GCM/ESM, etc., with different component models, so there’s no consistency to start with, and NRI is not supporting all of them – so we do need to leverage international standards where possible. Difficult feedback loop between “the user community” and “ACCESS modellers” because the two are not clear cut.
- ACCESS-NRI workshop coming up in September
- Dale to deliver Aus-2200 training but why? Not affordable or viable for anyone else to run.
- Monday and Friday will both have ACCESS training
- Chloe not involved but do they want a CMIP7 update??
- DCCEW relying on CSIRO to miraculously pull off CMIP7 as they did CMIP6 again? Too risky. Martin raised this at last workshop.
  - High level discussion about responsibility for CMIP7?
- Everything needs to be discussed on the NRI forum (Access-hive – https://forum.access-hive.org.au/), not sure if this is the right spot for CMIP7 at this stage?
- ACCESS-NRI original scope did not include data (so Paola did not apply) but they now say they have a data person? where does that sit?
- New CMS doesn’t have a data position either
National Committee for Earth Systems Science – Academy of Science https://www.science.org.au/supporting-science/national-committees-science/national-committee-for-earth-system-science. Andy Pitman has just become chair. Committee refresh underway and decadal plan will be made.

29 Jun 2023 climate data weekly meeting 17 2023

Paola, Thomas, Claire, Gen

Intake-ESM catalogues
- Thomas trying to make use of the NCI CMIP6 catalogue
- Hard to make sense of versions in the catalogue
- To get a single variable for a specific time period (ie multiple files) for all ensembles can require mixing a bunch of different versions
- When intake combines files (is this using open_mfdataset or something deeper) it starts its own scheduler even if you already have one, this triggers a warning, seems like something is wrong.
- Intake works well when the data itself is organised very well – so for CMIP may need to add extra information.
  - May need to use XMIP to fix some models e.g. coordinates, and remove it from the selection
  - Need a pre-processing column to exclude some variables that require additional processing before it can be used in an ensemble.
  - Have to manually figure out which files are “bad”
  - Documentation hard to find things in, e.g. how to exclude things from selection
  - Can work in dataframe that comes out for selection, then rebuild the keys of the actual files needed – this is complicated for us, terribly opaque for more novice users
  - Even ACCESS-S2 doesn’t work out of the box with the catalogue
- Dougie circulated doco on how to use his catalogues, we should try that and provide feedback
  - There’s so many potential problems then probably everything will have issues somewhere
- Using pre-processing can help reduce dask overheads for memory etc.
  - Especially an issue with COSIMA data – see recent CMS Blog post on pre-processing with xarray.
- Intake-ESM devs have or are moving on, there’s a support risk possibly
- Few of us have time to build and document the underlying infrastructure and processes needed to make data accessible and usable by researchers who are not data experts.
Helpdesk future model
- NRI support model
  - Put questions in hive
  - Specific about what is in and out of scope – this means anything they don’t answer falls back to CMS
  - Focus is on ACCESS model
  - Support for other climate models and data is unclear
  - Who supports running ACCESS in slightly different configurations?
  - There will be a list of tools, data and models they support
  - Questions will be posted publicly with a “support” label, community given a few hours to respond. If no response, someone from NRI triages the post and classifies it as in or out of scope and find someone to support if in scope via a private slack channel – which could include CMS people
    - Increased overhead of having to monitor Discourse
    - Most people rely on email summaries
    - ~30 staff triaging the post – limited opportunity to build familiarity with users and issues
- CMS in next CoE won’t exist in the same way as under CLEX
- Sometimes what people appear to be asking turn out to be different to the actual problem
- NCI helpdesk (cws-help) won’t close (note that also contains CSIRO and BoM staff)
New Data publications
- AGCD snapshot
  - zv2 has a new data release up to end of 2022.
- PMIP data from CLEX
  - Being published now
- Climpact indices to be published soon
  - Related to ACS indices? will these be published?
  - follow how climpact was published in Europe?

15 Jun 2023 climate data weekly meeting 16 2023

Paola, Claire, Thomas. Apologies: Gen

CMS futures
- Not looking good for Paola No chance of being continued at UTas
- Data not a focus of the next CoE
- Helpdesk will likely transition to NRI, just focus on blogs and tangible outputs (ACDG books, metadata search etc). NRI will provide support for ACCESS modelling, ACCESS-supported datasets, analysing ACCESS-supported data – what is in and out of scope there?
- Need to eliminate confusion – end slack support, helpdesk support, move all enquiries to ACCESS Hive. How to ensure clarity of who is offering support where significant support effort is required?
Intake catalogues
- Paola will be able to focus more on intake-esm catalogues
xarray issues
- lots of chunking errors
- Need to preprocess to extract slices you need
- dask config split_large_chunks sometimes helps
- partial from functools can be useful

08 Jun 2023 climate data weekly meeting 15 2023

Claire, Chloe, Thomas, Gen, Paola

State of the Climate
- Too much work for a small group of under-allocated people, support with eResearch Project?
Science and Technology Australia
- Chloe did a Queers in Science session with them last week
- Survey about STEM careers open at the moment – Research.net Powered Online Survey
ARDC NeCTAR cloud survey – https://www.surveymonkey.com/r/2023ARDC
ACCESS post-processor
- The code used for some more complicated ocean variable calculations (e.g. transport vars) was developed by Andrew King and COSIMA people for CMIP5 and reused in CMIP6.
- Need NRI to support and provide these formulae
- Want to change some controlled vocabularies for non-CMIP processing, but currently that causes CMOR to segfault
- Some things in CMOR are currently hard coded but hard to know what is
  - Reach out to CMOR github for support? https://github.com/PCMDI/cmor
ACCESS-NRI support
- Claire C says they’re going to come up with a support model for people – try to align with Clex/CMS to work out collisions
- Currently support is channelled through the Hive
- Sometimes community provides support through the Hive but currently no clear path to tickets for direct individual support – if NRI start offering this, where does that leave CMS? What about things that fall outside NRI’s direct remit?
- Note CSIRO currently have a permanent data mgmt job ad open!!
Mapping approach
- Would be good if we had a common approach to vocabulary mapping
  - Intake catalogues
  - ERA5
  - metadata portal
  - APP
- Consolidate on a central github?
- There will be new vars in CMIP7 as well – fire modelling, ice sheets etc
  - Australia’s contribution to CMIP7 as yet unclear.

01 Jun 2023 climate data weekly meeting 14 2023

Romain, Thomas, Claire, Gen, Paola. Apologies: Chloe

gdata3 migration
- hh5 is currently being migrated so low functionality today
- Gadi being a bit flakey, e.g. post-processor fell over, not sure if related?
hh5 container environment
- hh5 have a “conda concept” container environment
- File lock problems seem to be resolved after a change on NCI’s side
- Romain has deployed one for NRI to work with ESMValTool and it’s working very well
- Hard to know if problems are related to the container or general instability of the system – report any issues to Paola to catalogue and report as scheme leader.
- Ron (Claire’s TL) used the container env with great success on ARE
Intake catalogues and CVs
- Thomas is using as a basis for his minimum viable product for ACS CCAM demo
- Gen has looked through his scripts and notebooks but hasn’t implemented yet in lp01(?)
- Probably will need a lot of mapping to merge more heterogeneous data like obs in ACS reference data
- Controlled Vocabularies are the elephant in the room.
  - Community developed translator tool?
  - Romain working on live CMORiser to convert on the fly, but you must write the conversion functions which is not a small task! E.g. APP4 gives a mapping but it’s a huge task.
  - Paola has gotten to the point with APP4 with a new thing that does the mapping, expand on CMIP-based ACCESS. Go and look for variables it already knows in the database, produces example file to check it’s correct and add things the system doesn’t know before running the post-processor in full. Change ACCESS Archiver to capture realm, frequency etc consistently. Have to make up names where they fall outside of CMIP requests – look for similar variables, but still a bit subjective. Need to meet with others working on this in the community, maybe NRI can endorse/support it?
    - How to support future releases of the model? Common vocabulary?
    - Still need to deal with legacy data and map things, this will continue for quite a while
    - Stash codes can be used differently depending on frequency – e.g. a few stash codes that are all surface temperature
    - Need to demonstrate you can use the tool to still follow CMIP rules or process a different model run outside of these rules and produce more generic output
    - How to trick CMOR?! Hopefully works
    - Mackallah, Chloe (She / Her) (Environment, Aspendale) is the only person who knows what some calculations were meant to do!
  - Ben L had the idea that you could use LLMs (e.g. ChatGPT) to help build translators
    - Someone in NRI tried this with some stash codes and it wasn’t perfect but good start
    - Problem is you have to get the mapping perfect
    - Remember you can never assign a standard_name unless it really has one!
    - Translator must be built by someone with a deep knowledge of the model and what the outputs really are (e.g. mean vs RMS vs 50th percentile for ‘mean’)
  - How to progress this?
    - Database tables commonly accessible to all ACCESS modellers
    - Be careful about defining vocabularies that differ from CMIP, may be more sensible but may mean extra mappings are needed (e.g. dai instead of day for daily!)

25 May 2023 climate data weekly meeting 13 2023: Feature topic INTAKE CATALOGUES

Claire, Paola, Thomas, Dougie, Blake Seers, Paul Branson, Sam Green. Apologies: Chloe, Ben L, Dirk S

Claire is recording this meeting – available to CSIRO staff here.

Building, nesting, and working with intake catalogues
COSIMA cookbook used a MySQL to track netCDF files but it has become unwieldy and now takes days to rebuild.
Intake, Intake-ESM are fit for purpose for model and obs data, so we want to build catalogues for our datasets.
Use intake-dataframe to build a catalogue of catalogues of our many disparate data sources
ACCESS-NRI github documentation should be enough to get started
- https://github.com/ACCESS-NRI/intake-dataframe-catalog
- can run “quickstart” demo notebook in binder (though sometimes struggles for resources
- Utilise intake-esm datastores from pangeo etc, and a global mean temp table from NOAA as csv via intake.
  - Can save dataframe catalogue and save with others to get your catalogue of catalogues. They can then perform searches on these catalogues. Search is a lot like intake-esm with a few additional features, including reducing output to only show what matched the search exactly.
- NRI meta-catalogue aims to leverage existing catalogues that others have built, e.g. NCI’s CMIP6 catalogue, CLEX-CMS ERA catalgoue etc.
  - Can add descriptions and metadata to help interpretability
  - Currently contains 41 “sources”, that is catalogues, some are huge, e.g. CMIP6, some are individual ACCESS model runs.
  - From here an access and load about 3PB of data! (presumably need access to relevant NCI projects)
- Paul: https://github.com/xarray-contrib/datatree/issues/134 – where does Datatree fit into this picture?
  - Can load datasets into datatrees using intake-esm, but probably cooler things that could be done with it
- Majority of work in catalogue creation is transforming datasets into standard vocabularies
  - Goal is for people to run experiments, build own catalogue and submit PR to dataframe catalogue.
  - Variable names could well be the hardest work – try to aim for CMIP6 naming convention – e.g. how to map in BARRA?
  - CMS try to use controlled vocabularies, e.g. CMIP, but this doesn’t cover all variables that we might see. NRI leave variables as modeller chose to call them, but build a ‘suggestor’ tool to help users search for variables like this name (e.g. “sst” might also be called “tos” etc).
  - Can’t enforce use of naming conventions but can recommend a CV
  - Paola building on the ACCESS post-processor to make suggestions on naming for stash codes – community agreement on these recommendations would be good.
  - The NRI work presented here and transformers etc are not “general” tools, they are specifically for the NRI catalogue, but could still form a basis that others can build on for their uses
  - Plan is to release to “trusted friends” with low expectations to get feedback, e.g. from the people here.
  - Thomas – utopian future would be intake and associated tools built on this foundation, translation tool that we all work on to merge together to the CV, lookup tables for all possibilities. That’s a lot of work – CMIP put in a lot of effort and even that is not fit for purpose for everything. Get model developers engaged in translation tool ideally.
    - Paola – metadata search portal – end up with GDM(?) metadata CVs but it’s still difficult to make fit.
    - CMIP6 are able to be very strict but their rules can’t and don’t apply more generally. How to approach compromises??
  - Blake – Metacat object is actually a pandas dataframe – so you can do more complex searches on the pandas dataframe.
    - So how to go from querying metacat to the catalogue itself?
    - Search dataframe to find which catalogues you want, to_source takes you to intake-esm catalogue and can use with to_dask or to_dict etc to get to the data.
      - to_source is limited to the sort of queries that work with intake-esm in the downstream catalogues

11 May 2023 climate data weekly meeting 12 2023

Thomas, Chloe, Paola, Gen, Claire

Workshops
- 2 day NESP workshop last two days, good to see people
- NRI ACCESS workshop in Canberra later in the year, are people attending? CMS may align with team get together but then it’s a lot of people-ing!
Intake catalogues
- Paola pointed Thomas to a relevant issue on nested intake catalogues
- Turned out to just be a package update problem!
- Thomas now has a working demonstrator for ACS of how to handle multiple downscaled regional models in nested catalogues
- How to make as transparent as possible – still need to know where dataframes are, what keys are called, etc.
- Works well for common intake catalogues, ie where all datasets are GCM-like, harder to merge with obs (Dougie may have some experience as he’s working on a similar thing)
- intake-esm is very powerful but seems to lack good documentation
- Can pass arguments to intake e.g. use CFtime to open ACCESS data, but how to pass that argument is not obvious.
- CLEX example – https://github.com/coecms/xmip_nci
- When you request a dataframe you break away from the intake object – need pre-processing functions and it seems hard to manage
Gadi issues?
- Sometimes tasks appear to get stuck occasionally, unclear why – sometimes file locks but sometimes things seem to just stop/pause?
- Maybe related to ARE errors but happening on Gadi too?
- Maybe a gdata problem?
- NCI now have an unscheduled outages page, it’s still limited and retrospective but it’s a step forward! Communication is vital!
- CSIRO Petrichor has had really big problems in the last couple of weeks but communication has been clear despite it really affecting our ability to work.
Paola on leave next Thursday.
Dougie has been doing work from the ACCESS-NRI point of view and like all his work it looks very useful, clean, and well documented.
- https://intake-dataframe-catalog.readthedocs.io
- https://github.com/ACCESS-NRI/intake-dataframe-catalog
This is now public and available on conda as an intake plugin.

$ conda install -c accessnri intake-dataframe-catalog

I think this addresses one of the issues I think we were discussing?

“Intake already provides the ability to nest catalogs and search across them. However, data discoverability is limited in the case of very large numbers of nested catalogs, and the search functionality does readily provide the ability to execute complex searches on nested catalog metadata. intake-dataframe-catalog aims to provide a very simple catalog of subcatalogs that emphasises subcatalog search and discoverability.”

We should ask Dougie to give us a demo on the ACCESS-NRI intake-dataframe-catalog.

04 May 2023 climate data weekly meeting 11 2023

Claire, Romain, Paola. Apologies: Chloe

Dask slowness problems resolved, NCI have fixed network configuration – still blocked to 1MB/s to outside world but internal communications fast again.
COSIMA file performance problems
- One file locked – too many simultaneous users?
- Many files need to be opened for operations as they contain multiple variables – lots of file contention.
- Be good to post-process out the individual variables
- Romain will check with Dougie
APP5 development – Sam and Paola continuing work on python3 branch
NRI update
- “Just to let you know that we are looking into making ESMValTool and ILAMB compatible with native ACCESS outputs.
  Our approach is to use a live cmoriser that reads the outputs and only converts what is needed for the evaluation.
  It’s still very much in a proof of concept stage. ESMValTool has already limited support for native CESM and other ESMs.
  Rhaegar is working on ILAMB and I am doing ESMValTool.One approach could be to use APP4 as a dependency but there are other options.”

27 Apr 2023 climate data weekly meeting 10 2023

Paola, Chloe, Gen, Claire, Thomas

Introduce Gen to Chloe – working with Francois, being catalogue/data manager, projects looking at fire weather nationally and rainfall in Vic
Dask issues in ARE, incredibly slow, memory errors
- ARE nodes opened to the internet, but in doing so TCP/IP traffic has been restricted, which is the protocol dask uses by default – believe this is the cause of dask issues.
- Can force dask to use UCX (infiniband) instead of TCP but unclear if this is the solution
- Dale (Clex) to meet with NCI to figure it out
CREATE-IP
- New replica will be less piecemeal, qv56 data to be removed so there’s not duplication
- Unclear why it’s in a new project
- MERRA2 still not resolved – NCI downloading in CREATE-IP but that is only a small subset of the MERRA2 collection held in ua8/ia39(?), so they may remove that from CREATE-IP, acknowledged the CREATE-IP one is smaller/redundant.
CMIP pages at NCI update/Clef
- Pages have been edited to remove reference to Clef to find/request data, states as obsolete/deprecated, recommends use of Intake catalogues, however that neglects the key Clef use case of data requests – comparing local data to ESGF data
- Will MAS updates be discontinued?
- I guess that gives Clex freedom to move Clef to using Intake as a backend instead of MAS (somewhat ironic that they were using MAS for collaboration reasons!)
- MAS enables checksum comparisons which we can’t readily do with Intake
- Once ACCESS post-processor uplift is done, Paola could modify Clef to use NCI Intake instead of MAS, prepare for CMIP7 assuming an intake catalogue only approach
Intake catalogues
- NCI has intake catalogues but there’s also hh5 and Gen is building catalogues in lp01 (CMIP regridded project), Dougie also building catalogues (see 13/4 notes), Thomas making catalogues for ACS P3 in xv83 for internal ACS use – CCAM output catalogues, master catalogue linking them demo approach (“nested catalogues”).
- ACS catalogues approach – https://github.com/AusClimateService/data-catalogue/blob/main/Aims-Building-ACS-P3-data-catalogue.md
- Demo notebook
CMIP7 data request
- Upcoming community engagement in June
- https://wcrp-cmip.org/event/variables-drop-in-session/
- Modelling centre – in Australia is that NCI, CSIRO/CLEX, ACCESS-NRI? All? ACCESS Oversight Committee
- data request team working on a fairly tight timeline, decisions needed by late 2024.
- Do these plans affect work on the ACCESS post-processor? Harvest information, fill out template, populate database for on-use. Table system is quite cumbersome.
- CMOR not designed to be flexible, meant to be very rigid to ensure delivery of CMIP6. You can fake it out by forcing it to look like CMIP6 to process variables/frequencies that aren’t part of the core request if it’s related to the parameters listed.

13 Apr 2023 climate data weekly meeting 9 2023

Present: Paola, Thomas, Gen, Sam and Yiling. Apologies: Claire

Dougie has shared his work on intake catalogues:
- “WIP catalog is here: https://github.com/ACCESS-NRI/nri_intake_catalog. It uses this intake plugin: https://github.com/ACCESS-NRI/intake-dataframe-catalog”
- We discussed the option of having a future meeting dedicated to intake to share information on different intake catalogues at NCI. Yiling said some information on the way the NCI intake catalogue are structured and created should be available online.
Yiling also pointed out this paper (and possibly a spreadsheet but we couldn’t locate that yet) as an example of initiative trying to create a list of available climate data portals.
Yiling also shared the release of the CREATE-IP collection now available: https://geonetwork.nci.org.au/geonetwork/srv/eng/catalog.search#/metadata/f2806_0537_4223_2257 in qu79 project and that the older version of CREATE-IP will be removed from qv56. In terms of CREATE-IP not having any more support, Yiling pointed out that they noticed some datasets getting updating but probably by request from users, as updates are not regular.
- Claire questions for next week – what are the differences between the qu79 version and qv56 version? Same datasets available? Also question on comms – can we be consulted on these changes in advance next time possibly? Why the new project instead of updating in-situ in qv56? Move from synda to esgf-download?

06 Apr 2023 climate data weekly meeting 8 2023

Paola, Chloe, Claire. Apologies: Thomas

APP4
- New starter Zhao-Hui Wang in ACCESS team now being trained in using APP4 (Arnold)
- Use the ACCESS-NRI Hive version
- Updating CMOR version not an issue
- Pretty much working in Python3 now
- A lot of prep work in APP4 in creating time axes etc will be replaced by xarray
- Some bits of code may not be used currently
- JSON definitions
- Database(s) okay – master CSV (mapping file)
- CABLE variables are tricky, rely on stash codes, the data itself isn’t self-describing – JULES can have similar issues too though.
- Zhao-Hui should connect with Paola and Sam re APP4 changes
- Sam keen to introduce classes – should reflect the database
- Dale trying to integrate Archiver into the model runs and improve dask efficiency

30 Mar 2023 climate data weekly meeting 7 2023

Paola, Thomas, Romain, Gen, Claire (temporarily)

Welcome new people!
- Round table to introduce ourselves and what we do
- Discussion of catalogues for CMIP and other climate data (ACS catalogue, hh5)
- note: Claire also invited Yiling to resume NCI’s participation but no response yet

23 Mar 2023 climate data weekly meeting 6 2023

Claire, Paola, Chloe

New meeting notes home
- NCI cleaned up Opus and our access was removed, Ben E exported the notes and they are now hosted in the CSIRO ACCESS Confluence space (Meeting notes – ACCESS – Confluence (csiro.au))
- Currently public, should we restrict access?
- Revisit invitation list – add Romain, Yiling?, Jen (BoM)?
- Current list: Paola, Claire, Francois, Sugata, Tim, Kelsey, Thomas, Chloe, Alicia, Sam
- Reach out to NRI (Romain – model evaluation), Francois (who from BoM) and Yiling and invite to this as a regular data-focussed informal catch up.
Enabling Cross-institutional Access WG workshop
- Tomorrow 10:30-2:30
- Discuss report, have breakouts to focus on specifics
- Aim to endorse report so it can be sent to higher level people (Enabling access – Report – Google Docs)
APP4
- Ported to Python3, use xarray
- Paola and Sam have done a bunch of work in branches (ACCESS-Hive/APP4: This is the ACCESS Post-Processor (APP) (github.com))
- Clean-up of libraries called/used.
- xarray permits simplifying a lot of time functions (e.g. monthly to yearly is just a resample now)
- Need appropriate output from Archiver (or manually processed from the UM output)
- BARPA use axiom but that’s called after the BoM ACCESS post-processing suite is run and just does the metadata rewriting.
- Some confusing standard name usage (e.g. soil moisture content), mean and regular fields of a particular variable have different standard names. Coming from JULES (note ACCESS CMIP used CABLE)
ACS
- Mark Hemer no longer heading CSIRO’s contribution, he’s moved to the Climate Innovation Hub – where is the CLEX MOU?
CMS staffing in new centre
- Paola’s position likely to be discontinued at UTas!!!

09 Mar 2023 climate data weekly meeting 5 2023

Paola, Alicia, Claire. Apologies: Chloe

APP4 (ACCESS CMORiser)
- Romain has created a new development version at https://github.com/ACCESS-Hive/APP4
- Existing table not fit for the new high res grid
- New variables

Chloe published a handle in DAP
Creating DOI with snapshot version in Clex Zenodo
Hand over to ACCESS-NRI to maintain (Romain)
Clex (Sam) making modifications – port to Py3, drop CDMS2 (switch to xarray), add unit testing
Sam has created a branch from Chloe’s repo on NCI GitLab.
Can eliminate a bunch of repetition with different logic, probably
Modify CMOR tables as needed? Use drsreq ?
ACCESS Archiver will need a lot more work – parallelise, introduce xarray

ACS data

BARRA2 data is in a different project (BoM owned) – are we allowed to access it there or will ACS data be accumulated in ia39?

STRESS2023

CLEX offered 6MSU to undertake high-res Australian model on new Gadi hardware, should be able to model an entire year (possibly 2022).
Also use to train machine learning

NESP MOM6 modelling

John Riley (UTas) running 1/30 deg Australian coastal domain MOM6 model forced with ACCESS-OM2-01
Running on Setonix (Pawsey)?
Does this fall in Clex remit or NESP to provide data support? Paola and Claire to coordinate with John.

02 Mar 2023 climate data weekly meeting 4 2023

Chloe, Paola, Claire

CMIP7 workshop
- Andy Hogg gets it though, we should follow up with him

Claire, Chloe and Paola were in the “data processing/data mgmt” breakout but remotely, Yiling was in the room but Tammas took notes and reported back
Brief summary
A number of questions came up in the room, but not well addressed
Issues around software sustainability not well appreciated, overclaim of level of support
Importance of communication to bring together groups – forcing data, ensemble analysis, end users, etc.
Should we have a taskforce? don’t know! NCPC? no, out of scope, they just see CMIP as an input dataset
Impression that everything’s under control, data at NCI, looking after APP
Andy Hogg wants a plan in the next week (!!!)
Value of contributions, who should be leading?
Hand over Archiver and APP to Romain?
Standards are vital – and wrangling data into the right forms for CMOR and ESMValTool is (surprisingly?) non-trivial
How ACCESS Archive and APP work and who should maintain them in the future?

16 Feb 2023 climate data weekly meeting 3 2023

Claire, Paola

CMIP DRSv3

Emma Howard (BoM) still using this to access merged CMIP data – out of date but Tim E has set to regularly update again

CORDEX requests

Emma H asked how to download CORDEX data to NCI, advised to use Clef to construct a help ticket to Syazwan.

ACDG governance book
- Need to add clarity around role descriptions – publishing and management
- Authorship policy

Bec Gregory has done a lot of work tidying language
Paola working on a PR to sort out a few issues/tickets

CREATE-IP

There is a replica of CREATE-IP in qv56
However Yiling informed Paola that they have downloaded CREATE-IP to qu79 but public release is on hold pending clarity around licensing (should be a non-issue as ESGF data is open)
Claire and Paola to meet with Yiling to follow up and clarify what is happening with this dataset
Note that CREATE-IP isn’t an alternative to maintaining all other reanalysis replicas – it is not continuously updated and doesn’t contain all frequencies that users need…
Duplication of MERRA2??

MERRA2

Paola has circulated usage survey to ua8 , will do rr7 soon.

02 Feb 2023 climate data weekly meeting 2 2023

Claire, Chloe

CSIRO Environment

New Digital Lead Simon Barry (from Data61) starts today

Ocean data survey

Query about ocean data platforms in Australia – should we include AODN? Probably. To follow up with Edward King.

05 Jan 2023 climate data weekly meeting 1 2023

Paola, Thomas, Claire.

ARE/dask
- Here’s an example that runs in 5 min for user A on Gadi but user B finds it times out on ARE
- Bespoke Pangeo Singularity (BenL/Dougie) vs ARE performance (Ben zarred data in 2.5 hours that took Claire 10hrs in ARE)
- Seeing the same questions over and over.

Paola finding she’s getting kernel errors with notebooks that were working before Christmas?
Node congestion issues?
Runs okay in the queue
Need good examples of using dask in complex scenarios utilising dask delayed on large and complicated workflows, e.g.
There seems to be stability and speed issues but they’re not exactly reproducible.
OOD is sometimes more stable?
Ensure dask workers are using $JOBFS and not using too much memory.
Note the slowdown on gdata5 that Marcus noted before Christmas, could this be a factor?
Would it be useful to have a repo documenting ARE workflow issues, linked to the ACDGuide Big Data book?
Need documentation to show complicated workflows, e.g. to stop SU wastage in requesting a whole node that’s not needed the whole time.

ACCESS NRI/support

Who is doing what? Where do you go for different issues?
Still unclear, despite meeting before Christmas – attended by CLEX (Paola and Dale), NRI, NCI, and Scott from BoM/NRI.
Currently a lot of cross-posting between CLEX slack and ACCESS-Hive forum – encouraging users to post to the hive but not always getting answers there, and scope seems to be very broad/entirely open (not limited to ACCESS model), so it’s unclear when to use ARCCSS slack, cws-help email, Hive – but this support is not core NRI work (ie ACCESS configurations), so risk for expectations ongoing if the NRI staff themselves won’t be maintaining support there indefinitely, and the community is not built and self-sustaining like it is for COSIMA (where it is working).
All platforms have long term findability issues – slack history expires, Discourse remains searchable but in post/response format – value to blog structure?
Future of CMS is unclear – next generation of CLEX, hosted at Monash, has a view that’s more NRI-aligned and sees less need for the CMS team as it currently is.
CMS still providing ACCESS support too, e.g. Dale’s recent ancil ticket.
Claire C at NRI is actively developing documentation (CABLE)
accessdev future – whose concern is it!?
Infrastructure planning meeting to be held monthly

CMS future

New centre starts December 2023, current CLEX runs to end of 2024
Christian Jakob is director of the new centre
View that next centre will have a CMS team of only 3 staff and wants 2 of them in Canberra co-located with NRI.