Meeting notes 2024

2024 meeting notes and ongoing actions…

21/11/2024 – weekly meeting 31 2024

Thomas, Claire, Damien, Paola

  • NCI things – Paola saw Hannes and met baby Freda 🙂
    • Things going on at executive level
    • Heather is a new person working with Jo we think? helped publish AUS2200
      • AUS2200 finally published, 60TB!
      • Using ACCESS-NRI storage.
    • New builder for NRI catalogue, should work on outputs produced with MOPPeR
    • Met with ACCESS-NRI re conda environments
      • ACCESS-NRI will take over management of the environments in a new project
      • Unclear who the maintainer will be
      • Will probably use the containerised Singularity version
      • Similar “kitchen sink” approach
      • Will formalise the rules around requests which will be submitted through the Hive
      • WILL NOT BE IN hh5!!!
      • Will start in December but won’t have a stable release yet.
  • Governance book
    • Updated all dapds00 references to thredds.nci links
    • Paola to meet with Claire & Sam re where it goes and also the OneClimate portal and Nectar project
  • Paola’s last day will be Dec 4th
  • MOPPeR is notionally supported by ACCESS-NRI now

7/11/2024 – weekly meeting 30 2024

Claire, Paola, Damien

  • W21C
    • re-advertising the ANU CMS position
    • Annual workshop was last year. Paola submitted a video celebrating CLEX CMS which was well received
  • MOPPeR
    • Paola found another bug – defining one particular variable and table, called with wrong argument order triggered bug but can be bypassed with config file
    • Will require another version
    • NRI support for MOPPeR issue (see https://github.com/ACCESS-NRI/MED-utils/issues/7) 
  • Analysis3 environment issue 
    • Issue for MOPPeR but affects anything using Radar data
    • Problem with py2c package in the current analysis environment but no one has complained yet
    • Could be fixed by pinning xarray but that is risky
  • Ongoing issues with gdata instability
    • Paola having problems with rsync
    • Claire post-processing tasks 40 jobs at a time approximately 50% of jobs would fail and need rerunning, so takes a number of run throughs to get all years processed.
    • Is this related to OpenMPI errors Paola is seeing?
    • ARE stability seems to be improving
    • It is probably related to a Lustre bug, CSIRO seeing the same problem on /scratch3 following an update
  • Project access
    • No overarching project for W21C, hard to get access to the system for data analysis, can’t really do an NCMAS proposal for that.
  • Reference data collection
    • Any update on service account user for reference data replication? Don’t think so, ask Chloe to follow up
    • ACS budget cuts next FY, future uncertain but “keep the lights on” scenario will surely continue to at least pay for the data storage. Serious issues with future of ACS, how to deliver climate projections. NPCP??
    • ACS pivot to building a portal but CIH/CIP demonstrates this doesn’t actually work – it’s a hard problem. (also limited market for the data and mismatch of needs of industry vs consultants who would work with the data)
    • CSIRO pushing back to keep working on science and delivering intelligence in house not via a 3rd party.
  • Paola wants to switch go focus on governance book and metadata portal in the next 4 weeks.
  • Discussion about politics!

31/10/2024 – weekly meeting 29 2024

Damien, Paola, Thomas

17/10/2024 – weekly meeting 28 2024

Claire, Damien, Paola, Thomas

  • Reference climate data
    • Sam has permission to keep maintaining the datasets
    • jt48 ready to go with writer’s group
    • Paola reopened the old service account ticket but has had no response despite talking to Andrew W in person in Canberra
    • Maybe need to start a new ticket referencing the old ticket
    • Some files missing from iMERG dataset, Sam to look at fixing that.
    • Keen to move the sea level data first
  • MOPPER 
    • MOPPER updates done, want to release as a package on Conda but having some issues with automatic testing
    • Another AUS2200 runs to be processed
    • Going smoothly now with the updates
  • Ocean functions package
    • Spoke to ACCESS-NRI Oceans team – they are interested but won’t happen in the near future.
    • Be good to be able to take Gibbs Sea Water (GSW) etc as a trusted package
    • Document how to add functions
    • Need to have functions curated and scientifically reviewed somewhere central
    • e.g. ESMValTool pulls together a lot of reviewed recipes, what is the process for considering calculations “blessed” for inclusion?
    • ESMValTool assumes certain output variables from CMIP are available as inputs, but MOPPER needs the functions to create those variables where they are not output by the model
    • CMOR, ESMValTool etc are very powerful packages but they tend to be built for one purpose so can be quite strict on input/output standards, which limits reusability e.g. for general ACCESS output post-processing – strict Controlled Vocabularies. ESGF has then a tool that runs following CMOR that checks that the standards are respected – given this, CMOR could be more flexible. For MOPPER, where possible vocabularies were changed to extend CMOR capability to let it access a new CV e.g.

10/10/2024 – weekly meeting 27 2024

Paola, Claire, Thomas

  • Reference climate data
    • Paola ran a survey of ua8 users to understand which datasets are still of interest
    • 6 hits for LME data
    • MERRA2 still has some users but it’s split between rr7 and ua8
    • 20CRv3 still may be used by Lisa Alexander, but currently not well curated. CDS are migrating from tape to a disk-based system so may set up proper curation and move to ia39 once that happens (a future task for Sam?)
    • CMEMS sealevel being moved to ia39 already
    • There is some confusion over ORAS5 and CMEMS
      • Available monthly from Hamburg University – Paola will review this one
      • Also available from Copernicus Marine Data Store, some differences between MDS and CDS apparently same/similar data? Paola will check source and confirm.
  • Gadi issues
    • Inconsistent extreme slowness on login nodes
    • Ongoing issues with randomly not being able to open files
    • MOPPER testing resulting in random MPI failure messages which seems unlikely 
      • Could be an issue in the python environment (some issue with multi-processing?) but seems to be correlated with slowdowns and only since last major outage
    • Timeouts in login windows even without running substantial processes
    • From CSIRO, Gadi seems to be more performant when you use the Canberra VPN node than Victoria. Some issues with our GlobalProtect configuration?
    • Sometimes things fail, you go away and leave it alone and run again and it works. Hard to differentiate the real issues that need debugging from the random errors.
  • MOPPER/COSIMA cookbook/ESMValTool
    • Trying to create zostoga, reached out to model developer
    • Some stuff in APP4 is really old and no longer needed – e.g. don’t need to calculate height from pressure, there’s a module for that
    • NRI is doing a similar thing. There is stuff in the COSIMA cookbook but couldn’t see documentation for it. E.g. Overturning circulation is there but are there other calculations available and undocumented?
    • This would all be better managed if there was emphasis on code and data management from the outside of project establishment and ongoing – but in practise everyone is just doing it around the edges.
    • What we need is tools that can extract processes out for generic reuse – e.g. AMOC from COSIMA, any recipes from ESMValTool (may be coming)
    • Some confusion around structure of ACCESS-NRI Ocean modelling team. Anton may or may not be in that team but he certainly knowns ocean modelling well. Christopher Bull is currently leading Ocean Team while Dougie is on parental leave.

3/10/2024 – weekly meeting 26 2024

Sam, Claire, Paola, Damien, Thomas

  • Paola aiming to finish work in the next month 🙁
    • Has to stop before December
    • Planning to also use LSL before then too
    • May balance LSL with a few hours work a day
  • Big Data book 
    • Paige on parental leave till February
    • A lot of work needs to be done on this still
    • Thomas keen to add info on Analysis Ready Data
    • May need to migrate to new github as current one is still based on Scott Wales’ config
  • Governance book
    • Could be wrapped up to maintenance mode pretty quickly
  • MOPPER
    • Instead of complex method for variable name finding, use the CMOR database
  • CMS blog
    • Lots of important content, should be maintained in new centre one way or another
  • aust-ref-clim-data
    • Unclear if Sam will be given permission to continue to support this data
    • Community need but CJ doesn’t want W21C supporting data that other orgs could do
    • Sam wants to continue to support as e.g. Lisa Alexander uses a lot of this data
    • Transfer of data from ua8 to ia39, some things will be automated but e.g. CMEMS sealevel will just be shifted across until new ECMWF portal is up and running
    • create subdirectory called pre-release for the data that is copied from ua8 but currently not automatically managed via a repository
    • Create branch for new directory for each dataset to be moved
    • Each repository will need to have new Secrets generated for the new functional account

26/9/2024 – weekly meeting 25 2024

Paola, Claire, Thomas, Damien

  • NCI issues since maintenance
    • Files exist with ls but not found in jobs or ls -l
    • Warning about OpenMPI fabric in all jobs (Paola)
    • Damien getting thrown out of ARE sessions every 20-30min!! (normalbw queue)
      • Can’t ask for anything bigger than Large – sits in queue and never runs
      • Small VDI jobs for 8hours just end within about half an hour of starting
      • Every day since maintenance
      • Paola also found she kept having to go to smaller and smaller requests to not get stuck in the queue
      • qstat -s to see why you’re queuing but it’s not because of resources
    • Thomas: “Looks like a related unscheduled maintenance event was logged 19 September – https://opus.nci.org.au/display/Help/2024+Unscheduled+maintenance+events+notifications
      According to NCI there have been no unscheduled maintenance events between 24 April 2024 and 19 September 2024.”
      • This notes that there was indeed a problem with normal queue.
    • It would be helpful if we had solid statistics of what is and isn’t working for everyone to try to diagnose where the problem is – is it with queues, gdata mounts, storage flags in jobs, MPI configuration, something else?
    • ARCCSS Slack will shut down when the centre ends, the ACCESS-NRI Hive may be the place to go but whether there is appetite for an equivalent of the #support channel remains to be see, the hive isn’t a place to air grievances.
  • Issues with quota use and data deletion
    • If users are not responsive and/or not removing unused data when you are at capacity, the Lead CI (only) can remove problematic users from their project, and can request NCI reassign their files to you for deletion.
  • Analysis Ready Data
    • Thomas has set up a project for people to create analysis ready data for temporary use
    • Motivated by ACS and COSIMA
    • Overlap with what Claire’s team are doing to create zarrs of coastal hindcast for validation prior to final product netCDF generation (https://github.com/AusClimateService/cchaps/tree/main/chunking/zarr)
    • Thomas will set up a code of conduct/expectations for using this project so that there is a process to prevent storage being filled up and not dealt with.
  • Reference climate data
    • Sam happy to do it but not sure if he’ll be allowed (yet)
    • jt48_w set up and we’re added
    • Open ticket for service account 
  • W21C support team
    • UniMelb CMS hire is Paul Gregory (ex-bom?)
    • Sam being pushed toward ML but he will also be saddled with a lot of support tasks.

12/9/2024 – weekly meeting 24 2024

Paola, Chloe, Damien, Thomas, Claire

  • Claire, Paola, Yiling and Jo caught up variously at the ACCESS-NRI workshop last week
  • NCI maintenance today and tomorrow, everything is offline
  • Reference data collection
    • There is a project jt48 “Climate data for CSIRO” which could be used to house reference data
    • We could transfer the aus-ref-clim-data from ia39 to jt48 which would be CSIRO funded storage but we could use as a community space
    • There is a lot of interest in this reference data from the new CoE (W21C) and likely Sam will be called on to continue to support it
    •  Need to work with NCI to create GeoNetwork records for each dataset to make it findable
    • Create a jt48_w writers group 
    • Request jt48 service user (explain use case to Andrew Wellington)
    • Update ia39 scripts to point to the new location
    • Could add a replica from jt48 to the climate data hub (af-cdp) to add to that “climate data for CSIRO” collection that Ryan is wanting to establish
  • Paola to work on Governance book
  • Paola to put some effort into updating records in the OneClimate portal
    • Claire to help adding more records
    • Rescan CSIRO DAP as there’s more recent relevant publications too
  • Talking to Paige (ACCESS-NRI training mgr) to update their materials to point to our books and things
  • ACCESS post-processing
    • You can’t make CMIP contribution data without MOPPER, it *is* a wrapper for CMOR that deals with the data that ACCESS produces
    • Important not to fiddle with variable names too much. Need frequency and date range in files to make it workable.

29/8/2024 – weekly meeting 23 2024

Chloe, Damien, Claire, Jo, Paola, Thomas

  • [ACS] Reference Climate Data focus
    • aus-ref-clim-data sits in ia39 (previously rr7 then ua8), which is ACS funded storage but it’s managed by CLEX CMS.
    • https://github.com/aus-ref-clim-data-nci
    • https://aus-ref-clim-data-nci.github.io/aus-ref-clim-data-nci/intro.html
    • Paola and Sam Green set up automated download of reference datasets (mostly precipitation and sea surface temperature)
    • Also includes reference shapefiles for the Australian region
    • ACS provide the storage for the broader community, but the storage could disappear if ACS funding were reduced
    • Ideally we wouldn’t keep it in project storage but have it in a separate project that was more appropriate to share widely
    • Sam is now at W21C and may be able to continue support but it’s rather unclear if he can
    • Sam on leave but working jointly with Clex for next few months
      • Should be able to provide ongoing maintenance.
    • Could ask NRI for storage for this?
    • NCI support? NRI support?
    • Clarity around NRI’s data offering (in support of ESMValTool) and this collection which is actively maintained for general use, but NRI don’t have a data team at this time, so no one in charge of this type of stuff.
    • [note the shared conda environment hh5 should survive – 796 active users! – but no deep expert like Dale able to sustain it now]
      • Ideally if NCI take over management they should incorporate the dask configuration and optimisation work that Dale did in hh5 as it’s really good!
    • W21C support staff are not a team like CMS so there’s not the same opportunity for coordinated work as there was in CLEX
    • Reference collection is currently 26TB so it isn’t big. We could if nothing else move it to CSIRO storage so it’d have more longevity than the ACS funded storage. So we can probably secure ongoing storage without too much trouble
    • The main problem at the moment is the ongoing maintenance.
    • NRI have an Expression of Interest process to get access to NRI storage, but they don’t have a data team/focus, this collection isn’t obviously aligned with their strategy. Community expect they will support things like this but it isn’t clear that they will. 
    • Note we want to register this collection in NCI’s GeoNetwork – not publish, just get a record in the database.
    • Similar issue with MOPPER re maintenance
    • CSIRO-scheme NCI projects are increasingly being tied to research projects which does make persistency a little trickier
    • The community all see the need for these collections but there’s currently no centralised solution to supporting them  at this time
    • Important to avoid replication/duplication
    • Is there any way to raise the profile/importance of these underlying support structures in W21C? 
      • Some people who might have influence include say Julie Arblaster, Lisa Alexander (particular interest in the ACS reference data), 
    • Technical skill to support things like automated collection maintenance and the hh5 environments really sits in NCI or maybe NRI. Note there’s also a storage obsolescence risk with hh5.
  • Federated Climate Data Initiative (FCDI) project lead by Tom Remenyi. Lofty aims to federate data, provide routes to publishing, sharing, data access, analysis… but the research aspect of that has effectively stalled, the ARC grant hasn’t obviously lead to anything that’s usable for the research community (though the Eratos platform seems to be useful for some commercial clients), doesn’t look like any useful return on investment.

22/8/2024 – weekly meeting 22 2024

Thomas, Claire

  • Discuss Environment Digital review
    • Thomas assigned to work on CARS contribution
    • Severe OpEx cuts, why are we doing this, what will the tangible outcome be that helps us work?
  • Hannes on leave for months, haven’t heard from Jo again, query changing meeting time for her?
  • Should we have someone from NRI here? Romain hasn’t nominated anyone else but maybe we should reach out to whoever is working on MOPPER?

15/8/2024 – weekly meeting 21 2024

Paola, Claire

  • Discussed positions at NRI, W21C, IMOS.
  • Tidying up MOPPER with Pytest etc
  • Need to update OneClimate to production and update records
  • Review Jupyter books too

8/8/2024 – weekly meeting 20 2024

Claire, Thomas, Paola

  • ACS
    • Funding expires at end of this FY at this stage
    • A bit of uncertainty around future funding/allocations
    • NESP interaction with ACS also a bit uncertain but building interconnection
  • Eratos
    • Have opened offices in Hobart and Sydney
    • Proposed as a solution for CSIRO climate data – does not look like a good fit for massive scale data but it is nice for agricultural application of smaller (derived) variables, e.g. 
    • Not sure it’s a good solution for us but nice to be aware of/engaged with – Thomas R worked under Paola at UTas in the past.
    • Similarities to weather@home project in the past.. (Mediaflux push)
    • What about CSIRO’s previous “partnership” with MS in the climate space, did anything come of that that affects us?
    • Need a variety of platform and tools!
  • Intake
    • Misconception that Intake is basically a database – it’s a python package with tabular data etc working with xarray
    • Future of Clef to use Intake
    • Saying “X now supports Intake” does not replace COSIMA Cookbook, Clef etc – they are different things, maybe we’re not communicating this well
    • NCI say “Clef is now deprecated in favour of intake” (https://opus.nci.org.au/display/CMIP/Datasets+and+Available+Variables) but they’re doing different things!
    • Similarly MOPPER and Intake are different tools – it can’t fix bad data!
    • Intake isn’t magic! It can’t fix poorly structured data or hetrogeneous data, you need methods to deal with diverse data – which is why the initial NRI catalogue was very limited in what it supported.
    • Not fit for purpose for raw model output for public use – find for internal use to feed MOPPER, but the UM raw model output is too tricky for public/broader consumption.
    • Data publication requires a higher level of documentation
  • Risks from CLEX ending
    • Portal, data governance, hh5 conda environments, Clef, MOPPER, blogs – documentation needs constant maintenance.
    • CMS effort really not acknowledged.
  • ACCESS-NRI
    • Clare Richards appointed new Data Mgmt team lead (but role not advertised)
    • NRI workshop upcoming – Paola will attend.

1/8/2024 – weekly meeting 19 2024

Paola, Thomas, Claire

  • Intermediate variables
    • You can use Intake decorators to record how to calculate intermediate variables (e.g. eddy kinetic energy from u and v)
    • No value to keeping these easily derived variables on disk, but important to record how they are calculated for reproducibility
    • COSIMA cookbook has the record but they’re in notebooks not callable functions
    • Embedded in MOPPER too but again can’t extract them to use directly (at the moment)
      • Want to separate the functions from the tool
      • Can then register the calculation and add it to the intake catalogue
    • Can preview your data before you decide to process it
    • The decisions you make about how you calculate something you’re going to rely on in manuscript plots are very important and must be recorded clearly. There’s often a lack of consistency in how the same variables are derived.
    • Thomas gave a talk to CSIRO last week about the speed ups available by saving optimised intermediate products short term for reuse during analysis. We need to resource (FTEs) data science including data handling.
    • Duplication of effort is wasteful as is use of computational resource 
    • Need a strategic approach to get work done faster (and make our lives better!)
    • Important for funding organisations (CSIRO, BoM, unis/CoEs, NRI…) to recognise the need to build data strategies from the ground up and value expertise.
  • Post-processing model runs
    • Inconsistency between ACCESS-CM runs – different MOM6 results
    • UM is hard to interpret but stash codes can be looked up. MOM6 variables are hard to understand.
    • Importance of things like gridspec variables!!

18/7/2024 – weekly meeting 18 2024

Paola, Jo, Claire, Thomas

  • Paola attending NCI training this afternoon
  • NCI GeoNetwork change
    • Can now add records without creating a DOI
    • This is appropriate for listing ia39 as a collection and its containing datasets
  • NRI data
    • Kelsey confirmed that a data focussed role won’t be created for a while
    • Thinking of putting someone on a contract, but the new EBA is a problem for that
    • Will hire Clare Richards (BoM retiree) as a casual to do high level data governance
    • Clare focussed at the licence level, not technical level
    • Intake catalogues (e.g. built by Dougie) builders are quite fixed and not necessary compatible with how other groups might do it
    • Catalogue builder needed for every version and could need a different one for ESM1.5, OM2, OM3 etc. 
    • Heavy-handed needing to build a class every time, CLEX just used a regex instead of trying to make data more uniform.
    • NRI started cataloguing at an earlier level in the modelling process, which has some pros but makes it difficult for other data
    • Paola hoped to use them in MoPPER but can’t as too much additional info is needed from the user
    • Raw ACCESS output has all variables in one file (need to look for specific fields at least in the UM) – replicated files, add mapped value in Intake catalogue for MoPPER output
      • Things coming out of ARCHIVER should be okay. AUS2200 could be more difficult but they were already preprocessed out. Don’t know if it’ll work for eg ACCESS-ESM straight out of the model – but might accidentally pick up Restart files.
      • Makes more sense to operate on cleaned Archiver-type output than raw UM output.
      • One command to create mappings. From this, can post-process the data.
      • One command to create intake catalogue.
    • Getting someone seconded from elsewhere in ANU to work on Intake catalogues. Dougie is in (currently leading) the Ocean Modelling team, he just did the Intake catalogues from his own initiative.

27/6/2024 – weekly meeting 17 2024

Thomas, Paola, Hannes, Jo, Claire

  • Object storage
    • Pawsey Acacia training on Tuesday was attended by Paola, Claire, Thomas and Hannes (and also Michael Sumner)
    • Introduction to how object storage works
    • Pros and cons – parallel write very attractive, not really possible with netCDF
    • Be good to collate into a CLEX blog post of info relevant to the climate community
    • Thomas follows Pangeo and developments by Ryan Abernathy & Joe Harman closely
    • When using zarr on posix filesystems should definitely use zipstore to reduce inode problems
    • Hard to know exactly what the problems are from a systems perspective
    • As a community develop shared experiences
    • Thomas has a poster at ACOMO-COSIMA next week and will be giving a talk to COSIMA in August (8th)
    • Conceptual understandings are often lacking in the community – 2 copies of the data if you have both zarr and netCDF
    • Paola to explore nczarr in netCDF 4.9.2 on Gadi (following Claire’s notes from 4.8.0 testing in the Big Data book) – Page not found · GitHub Pages (acdguide.github.io)(?)
    • Object store cheaper than posix -> 10% power consumption means much more environmentally friendly and can have much more storage
    • Understood that object metadata is a json, so surprising to learn that you have to retrieve the object to update just the metadata, and it can be hard to retain which is concerning. We want to be able to edit metadata without rewriting files.
    • Pawsey seem to be a bit ahead of NCI in terms of object storage, but it’s viewed as “warm tape”, whereas we’re keen to be hitting it and computing directly against it without needing to stage to /scratch
    • Pawsey training called in a storage expert to come and answer questions which was great. He indicated they’ll look at putting a database in front of the objects for searching and additional metadata handling
  • Welcome Jo Croucher!
    • Jo works in the Data Collections team at NCI with Hannes
    • Focus on data publishing and NCI data catalogue, and onboarding data
    • Started in Healthy research and retrained as a librarian, worked in that area for a number of years before moving to NCI
  • Voila and remote data access
    • Portal-like jupyter lab scripts
    • Hannes wasn’t able to get working to an acceptable level for today but Nigel may join us when he gets it working well enough
    • ipy widgets with maps, look at available geophysics data, draw a polygon and identify and load data available in that region and load into a dataframe using Intake.
    • What are portals really for?! This could replace some simpler portals
    • We talked about this to ease finding satellite data on Gadi
    • You choose the collection at present
    • AODN data matching use case (model validation – Blake Seers)
      • AODN hackathon recently for CARS data, want to directly access AODN data via parquet on S3
    • The future is probably not accessing data via THREDDS/OPeNDAP, but it is fundamentally very functional, but S3 offers greater performance possibilities
    • netCDF = standards compliance, FAIR; zarr -like = much more efficient for storage and retrieval for serving/remote access
      • But note that 10 years ago data was all netCDF but metadata was still super poor. Tools writing appropriate attributes directly was really the turning point.
    • We want to know full provenance in our data.

20/6/2024 – weekly meeting 16 2024

Hannes, Claire

  • Checking netCDF data
    • When NCI onboards new datasets Hannes tries to check them for CF, ACDD, CORDEX etc standards as much as possible.
    • Can you use CMOR? No
    • IOOS and cfchecker both stop at CF1-8
    • Commented on an open issue to query this – there was a ticket on 1-9 and 1-10 closed last year saying it was coming https://github.com/ioos/compliance-checker/issues/972 
  • Data access/sharing across institutions
    • It’s difficult for various technical reasons – posix storage limitations, S3 auth limitations, THREDDS availability…
    • GA data often duplicated or apparently duplicated across storage platforms and it’s hard to be sure
    • Working on a tool (viola?) to map visualise data in Jupyter notebooks – to demo next week 
  • Hannes on leave for 6 months, visiting Germany for 2 months, next week will be last meeting
    • Adjust meeting time so Jo can attend?

13/6/2024 – weekly meeting 15 2024

Thomas, Claire, Paola, Hannes

  • Zarr file management
    • Use of Zarr zipstore is critical – v45 went over quota yesterday due to a zarr that was 700k inodes.
    • CMS had a difficult help ticket, worked out it was a zarr problem, but user didn’t appreciate how many files it was creating. Didn’t really understand why they were doing what they were anyway – following online examples. Thomas provided advice.
    • It’s not only use of zarr but the chunking is critical – it’s very easy to end up with much smaller chunks than is efficient on memory anyway and creates crazy numbers of inodes for a small file.
    • Thomas will be giving a poster at COSIMA-ACOMO and other presentations in the coming months about the importance of optimising performance through use of zip store, appropriate chunking strategies, etc.
    • Note that Zarr’s inode cost is only an issue on posix filesystems – object stores handle them very well. National HPC centres need to offer object stores then we can leverage these hybrid systems of HPC nodes with object storage.
      • ROI for NCI to offer object store – why not currently available and is it in the roadmap?
    • Pawsey having a Object store tutorial session in a couple of weeks on how to use their object store June 25
      • 25 June

        Acacia Object Storage Workshop

        Join our comprehensive one-day workshop on ‘Using the Acacia Object Store’ In this 5.5-hour online workshop, you will:
        📢 Learn the Basics: Understand the fundamentals of the Acacia object store.
        📢 Hands-on Practice: Gain practical experience using Acacia.
        📢 Expert Guidance: Learn from the experts to optimize data management with Acacia.

        Register here

         

      • Paola keen to upskill ahead of job search

      • This training looks useful for all of us to attend.
    • Paola previously used Mediaflux (object storage) for weather@home work, learning curve but made data retrieval much easier.
    • ArrayLake is where it’s at e.g. in the US, a lot of potential – Thomas: Random dump of links: 
      https://earthmover.io
      Arraylake: A Cloud-Native Data Lake Platform for Earth System Science
      https://youtu.be/tlACkUYYu7A?si=NjxHt_tBTRIgDJaq 
    • Solving the problem of solving cross-institutional data access by getting everything in the cloud. 
    • FAIR: zarr not widely supported so not very FAIR, and lose the netCDF metadata standards etc., but making everything cloud-accessible helps enable FAIRness too
    • Zarr/object storage lets you directly access specific parts of a file.
      • Access directly from home, or spin up data-adjacent compute
    • Break away from ivory towers.
    • At NCI, legacy data is an issue.
      • That’s what kerchunk is for

30/5/2024 – weekly meeting 14 2024

Paola, Claire, Thomas, Hannes

  • Gadi is down today
    • No unscheduled outage events noted since April… at odds with experiences with filesystems
    • Inconsistent performance from one day to another on the same filesystem/project/files
    • dask is hard to understand so it might not just be the filesystem, it is hard to understand where slowdowns are, the inconsistency is the clue.
      • Watching dask dashboard is instructive – can appear to be running but memory stalled – sometimes refreshing page makes a difference and somehow restarts
      • Sometimes just leaving it alone lets it work out – e.g. task completes but dask dashboard shows still running – not synchronised. Can’t trust the dashboard to be accurate and timely.
      • Do prototyping on ARE to tune
      • Changing timeout limits can help but it’s magic
    • Some filesystems are more stable than others, and ours seem to be more effected (gdata1a, 6, 4?)
      • Those with climate data on seem to be less stable but that could be bad luck for us but maybe it’s because of our access patterns.
    • Overseas people are moving to object stores not filesystems. In the US a lot of use of commercial cloud. Pawsey might be a good option for us via Acacia but not currently an option for us. 
    • Zarr just isn’t working on filesystems – Andrey says it’s not supported or recommended by NCI *despite* NCI documentation recommending it!! Sigh. 
      • NCI (and CSIRO) genuinely thought their scratch filesystems were more performant than they seem to be…
    • Gadi refresh/replacement SHOULD be going to tender. New machine must surely support object storage! When will that be?
    • Working with some datasets is difficult because the data itself has inconsistent chunking. If this could be fixed at source (ie NCI reprocess data ingested) this could resolve a bunch of issues.
      • Potentially this could happen following TDS and GeoNetwork updates, NCI (Hannes’ team) are interested in uplifting the on-disk data
    • Similar to kerchunk convo a few weeks ago, we shouldn’t have to fix these issues in software.
      • “All the technology can’t un#(%* your data” – Adam Steer
    • COSIMA data isn’t good and not receptive to feedback 🙁
    • NCI has good staff but we need to be able to have confidence that they’re working on uplifting their infrastructure to include object storage.
    • Zarr as a back end to netCDF will be great ( https://docs.unidata.ucar.edu/netcdf-c/current/md__media_psf_Home_Desktop_netcdf_releases_v4_9_2_release_netcdf_c_docs_nczarr.html ) when it happens – we need performance, it’s not about the specific format.
    • Should talk about zarr in the Climate Data Guidelines book.
  • Thomas will present some of what he works on next time – Hannes would love to actually see it 🙂

23/5/2024 – weekly meeting 13 2024

Paola, Alicia, Thomas, Hannes, Claire. Apologies: Chloe

  • Model documentation
    • Incredibly hard to work out meaning of variables in raw model for MOM – both grid and physical vars
    • NRI could add a lot of value by supporting/documenting MOM set up – it’s really challenging for Paola and Thomas who both have strong oceanography backgrounds!
    • ACCESS-OM2 release – but didn’t we already have OM2? This is the spack version? Are component models all the same versions?
    • Role of NRI in modelling – improvements vs focus on supporting configurations. Path from model development into stable roll out
    • A lot of developments, teams and tools still in their infancy.
  • NCI THREDDS server migration
    • Moving from dapds00.nci.org.au to thredds.nci.org.au
    • Update any dependencies – need to find
    • Climatechangeinaustralia is okay

9/5/2024 – weekly meeting 11 2024

Hannes, Paola, Claire, Gen

  • Data archiving
    • Claire backing up 4PB data to CSIRO tape using Globus using Steve McMahon’s management script to release 10TB at a time, getting a throughput approaching 400MB/s.
    • Data stored on MDSS for CLEX for projects that are ending will become inaccessible as no users will belong to the project to be able to see it, so querying NCI if project assignment can be changed on MDSS without having to pull the data back off tape and re-archive.
  •  Data processing – pacemaker experiments
    • ACCESS output variables as say mol/s but CMOR wants say kg/m2/s – can we have different units or do you have to find a conversion?
      • mol/s is a recognised unit but seems like an assumption is made about the composition of “sea salt” being deposited
      • Matt Woodhouse happy with mol/s but is it okay to leave it in those units
      • MOPPER will pass it through CMOR but doesn’t have to follow CMIP6 convention if not appropriate
  • Claire and Chloe talking about “FAIRest of them all” and data stewardship at CTDIS next week
  • Paola working on a data risk assessment associated with the CoE closure
    • Lots of researchers promise to clean up their data later but then never get to it
    • Need to enforce deadlines to clean up data by!
  • lp01
    • Got a bunch of CCAM and BARPA regridded monthly to 1.5degrees, have done state and NRM region averages. Consistent with CMIP5 and CMIP6 data ready for comparison so can demonstrate value add of downscaling in CORDEX data
  • Jobs
    • Gen’s contract ends today so working on renewal but might disappear for a few weeks.
    • ACCESS-NRI will advertise a Data Steward position – TL without a team to start with
    • NCI also likely to hire a data specialist with a climate focus – to replace Yiling who is now manager.
    • W21C (‘replaces’ CLEX CoE) will have a position to be advertised in June for technical support in Melbourne, a more ML-focussed one in Sydney and one in ANU these will be the CMS replacement roles. 
      • W21C will be smaller and have a narrower focus – more about regional modelling and weather events
    • Wilma is CLEX and due to parental leave will be with Clex for longer than it exists so Paola is managing availability for postdocs and people like her who will keep working for Clex after it ends and still need access to storage and SUs
    • https://www.21centuryweather.org.au 

2/5/2024 – weekly meeting 10 2024

Paola, Thomas, Chloe, Claire

  • CLEX datasets/handover
    • 1PB of user data on /g/data
    • Another PB on MDSS but that’s more reasonable.
    • NRI are willing to look after conda environments (hh5) with NCI, e.g. manage the continuous updates for NCI
      • CLEX hh5 policy is to only install things that are available on conda/pypi, and updated in last 5 years.
    • Haven’t had a chance to revisit the datasets yet since returning from leave.
    • Many datasets may disappear other than those handed to ACS.
    • MERRA2 is the most difficult – it’s big but may not have a large user base.
    • Should do a risk assessment when you open/close a centre – who is CLEX still responsible for (e.g. students/postdocs) that may not still have access to the data they need when clex projects shut down.
    • What happens to the LIEF grant?
      • Underpins hh5 and the ERA5 downloads
      • Some storage got redistributed but what happens now? None is reserved for datasets e.g. data remaining in ua8.
    • No routine clean up and no clear deletion policy anyway
    • Storage grant may have been used for working data which is very risky as it’s impossible to know if it’s actually still needed.
    • Paola trying to get the data under control or at least a plan by the end of May.
    • So much unmanaged data it’s like trying to find a specific thing in a hoarder house (!!!)
    • Very few CMS staff will migrate from CLEX to W21C so there’s a lot of risk as well in terms of people who know the history of the existing data.
      • W21C started in Feb but have not yet hired any CMS people (yet)
    • ACCESS-NRI may hire a data person.

11/4/2024 – weekly meeting 9 2024

Hannes, Paola, Claire. Apologies: Chloe, Thomas

  • CLEX ran out of quota at NCI because they didn’t realise the CoE had been extended and quota transfers were meant to be in place.
  • CF checking
    • Hannes has found Kelsey’s old CF checking scripts and a tutorial
  • Clex datasets
    • Currently no plan for ua8, e.g.
    • Note rr7 was never finally closed, but now only contains MERRA2
    • ua8 contains MERRA2 when rr7 ran out of space
    • “It’s a mess” “oh gees”
      • FROGS indices needs to be published
      • SODA is probably complete
      • Some datasets are downloaded subsets on researcher request and could be deleted
      • CMEMS (older version of AVISO)
      • AUS2200 needs to be published
      • C20C mostly to be deleted but some to be published
      • Large Ensemble data (from ua6) – sync to xv83?
      • Paola to sort out as much as she can and then we’ll review in a few weeks
    • hh5 will persist beyond June 30 and and has access to MDSS so some data can be archived there
    • When a project is decommissioned, access to its MDSS is lost – even if the data is still there, it’s inaccessible because you need project membership to see it.
  • xarray 
    • May change the way they handle geospatial coordinates – the way they store as floating points creates problems with raster
    • Mike Sumner submitting issues to Ryan Abernathy
      • When subsetting by ‘nearest’ can get columns of NaNs due to floating point representation
      • Need projection information inherent in the dataset to allow accurate subsetting
      • But it’s sort of against what xarray stands for – it’s not simplifying, it’s more complex to accurately support geospatial rasters
  • Paola away next week, and Hannes unable to attend as acting DS mgr.

28/3/2024 – weekly meeting 8 2024

Hannes, Dougie, Claire, Paola, Ben L, Paul B, Gen

  • Intake special!
    • Hannes presenting NCI’s Intake work
      • Building intake catalogues, particularly around climate data
      • Demo scripts to use Intake catalogues
      • Drivers support diverse file types, and the user doesn’t need to know about the formats
    • Dougie has written an ACCESS-NRI Intake catalogue (using a different driver again) which is a catalogue of catalogues, ie a catalogue of Intake sources (could be -ESM, -spark or any other driver, currently only Intake-ESM).
    • Intake v2 coming – Dougie watched a talk and it looks cool but it breaks things – need to pin in environments or updates will pick up the alpha version.
      • Can write transformations that add metadata, do operations on data, etc.
      • “Intake take 2”, there’s some info on Intake readthedocs
      • Paul – individual files not really visible in the catalogue anymore. Harder to read (though they were getting that way anyway). Each file represented by a sort of ID. Introspects files to add to catalogue and dynamically applies. Allows less boilerplate code and removes need for customisation. Get a version controlled file that’s like a meta language implementing functionality from generic frameworks (data sources, readers, transformations).
    • Paola – Previous limitation was choosing a single way to concatenate files, so v2 should allow people to join files differently.
      • Map files in a relational way sounds very good.
      • Move from user-focussed to managing centralised datasets.
    • Dougie – v2 utility for simple datasets is clear, but not sure how it works with more complex things like Intake-ESM which is very bespoke and handles data concatenation and stuff. Additional work will be required to port Intake-ESM to v2
    • Ben/Paul – also looking at kerchunk. Overlaps with Intake. Precomputes some work you’d need to do every time (e.g. concatenation of metadata), and has a plain xarray interface to datasets. Save view of where chunks are on disk. This alleviates time to build mfdataset calls.
    • Dougie – There’s some projects trying to make kerchunk better to use.
      • https://github.com/TomNicholas/VirtualiZarr 
      • https://github.com/NikosAlexandris/rekx
    • Ben – There’s STACC catalogues… can build a Kerchunk catalogue on that but it doesn’t handle inconsistent datasets like variables changing names, coming and going, resolution changes. How do you handle it in Intake?
    • Paola – do a search e.g. for CMIP6 data and try to load them, but if one of the matching datasets has an issue, the whole lot will fail to load.
      • Kerchunk would be very useful e.g. for 10min BoM data
      • IMOS trying to use kerchunk with their S3 storage where it’s very inefficient to retrieve all data to do a subset. But as a service provider they need to use stable libraries.
    •  Ben – issue with apparent non-deterministic behaviour (there’s formal issues around this)
      • Can handle netCDF3 and netCDF4 but can’t concatenate them together.
      • Paul – some of the issues are not kerchunks but actually dask/zarr. E.g. xarray can handle fixing individual files but zarr can’t handle non-uniform chunking along a dimension. Zarr will be changing this which will alleviate some of the challenges of poor data.
    •  Dougie – Consolidate chunks on load – ie define chunking and rechunk on load in order to concatenate dataset components together. Requires a good understanding of the underlying data and it’s issues – that’s a huge undertaking for e.g. 60 NCI catalogues
      • Need to load the catalogue JSON onto every dask worker which needs a lot of memory
      • Paul – implement parquet support for indexing (convert from JSON)
      • Template repeated parts of paths to reduce JSON
      • Compression layer over text.
      • Ben – Append new data as it comes in – effectively generating a new large JSON then converting to parquet so it’s still problematic on memory
    •  Martin Durant started both kerchunk and intake!
      • Martin is focussed on Intake v2 but there’s 43 kerchunk contributors and Ryan Signall (USGS) has funding to push kerchunk development along
      • xarray will attempt to merge datasets with different chunking, it in theory helps the user experience but can slow things down so much it maybe isn’t a good thing!
  • Persistency for hh5 environments and catalogues beyond CLEX
    • NRI may take them on, but need to negotiate also with NCI, e.g. merge with dk92?  Possibly collaboration between NCI and NRI. Claire Carouge discussing with Paola, and be co-responsible for climate python environment with NCI, remove NCI’s need to maintain a Python environment and NRI’s envs to be endorsed by NCI. 
  • CF/ACDD checking
    • Hannes to add the same checker (IOOS) and wrapper that Paola uses in hh5 into dk92. Need to check with Kelsey about summary python script that NCI used to use. Does that still exist or use Paola’s?

21/3/2024 – weekly meeting 7 2024 

Paola, Chloe, Claire

  • ACS reference data downloads
    • ia39_download user now working to download data automatically
    • However it would be wise to change ownership of everything to this user, rather than relying on the write group ACL
    • Best method is probably to make everything in the reference collection owned by the functional user, and securely share the secret to access this user with the writers group so any member of the group can access it to modify things as needed, instead of being solely dependent on Paola
    • Code sits alongside the data, but the ia39_downlaod user needs to be able to write to the download logs
    • Downloads (including auth, logging) are handled through GitHub actions.
    • Needed to add ia39_download user to hh5 group.
    • Andrew Wellington very responsive
  • ESMValTool workshop
    • Broadly useful for participants, maybe less so for the organisers, not much focus on specific ACCESS runs
    • Claire focussed on using CORDEX data
    • Alberto working on parsing Clef or Intake searches to produce ESMValTool-ready recipe input
    • Gen and Christine working on Jupyter Notebook integration (be great to be able to use ESMValTool alongside other tools).
  • CORDEX data
    • Mixed states of publication – CCAM and BARPA funded through ACS
      • Qld DES funded to publish a small amount of vars
      • NarCLIM?
      • WA NarCLIM not supported to publish yet
  • CLEX ending at end of this year
    • Some people moving to W21C 
    • Paola can’t guarantee storage beyond the end of this year though
    • hh5 underpins so many researchers!
    • Maybe NRI might take over maintenance, but could be told to use the NCI envs instead. NCI envs do not seem to be sufficiently agile, hh5 is very responsive to new requests, problems and changes. Also support for bespoke locally developed packages.

14/3/2024 – weekly meeting 6 2024

Paola, Hannes

  • Discussed CORDEX
    • Hannes had a bunch of questions – delegated work from Yiling
    • CMORising
    • CCAM and/or NarCLIM? Other data?

29/2/2024 – weekly meeting 5 2024

Paola, Thomas, Hannes, Claire

  • Decommissioning plan for CLEX NCI facilities if CMS data support funding isn’t continued to W21C 
    • 300TB data (ua8, rr7 ongoing data mgmt)
    • hh5 conda environments and storage
    • There’s a lot of dependencies on Paola’s NCI ident but if she stops working what happens?
    • Better managed data has moved to ia39
    • Maybe NRI will give some storage for publishing e.g. AUS2200
    • Paola puts a backup of everything published to ks32 to MDSS
  • Still need to check issues with ia39 functional user for automated downloads with Github Actions (Paola)
  • AGCD Big Data
    • Most incomplete but be good to draw a line under it (Overview — Working with Big/Challenging Data Collections (acdguide.github.io))
    • Paige keen to just make clear what’s incomplete and how people can contribute.
    • Point to existing much better resources for learning Dask etc now, and make the focus Australian tools
    • Note that ACCESS-NRI Discourse exists but can be harder to find the gems among the discussion sometimes.
    • Establish moderators so people can raise issues asking for resources to be included
  • ESMValTool 
    • ACCESS-NRI hosted a community discussion but mostly attended by NRI and NCI folk, I was the only other Australian (Plus LLNL and ECCC)
    • Paige Martin now at NRI working with Romain, brings a good broad overview of tools used elsewhere
    • Gab Abramowitz to work with NRI to enhance iLAMB
    • Thomas writing his own evaluation tools with xarray and intake, xmip, datatree
    • Note there are other tools too like PCMDI Metrics Package, icclim, xclim,
    • Inclusion of computing derived variables e.g. in ocean domain?
    • Thomas – aim to share metrics as widely as possible so it’s only written once. Claire – this is the goal of ESMValTool.
    • Need to write in one of our books – Big Data or Governance – how to write pluggable code for other tools, not specific to COSIMA or ESMValTool or whatever. 
  • NCI data services working on intake to improve internal QA/QC
    • intake-esm not really maintained, Anderson has another job now, Dougie needed some changes made and it was not very maintained, may not work with intake2. Who can look after this in the future?
    • Hannes is using intake-spark – less curated, scrape all metadata from all netCDF files, should be more robust to intake2 transition
    • Scrape all netCDF files for each publication, put into parquet files, with intake can open easily
    • Who should take care of intake-esm catalogues? “who loses the staring contest”? How to maintain confidence in cataloguing?

22/2/2024 – weekly meeting 4 2024

Paola, Claire, Chloe

  • CSIRO has uninstalled Zoom from our laptops, tedious
  • ACCESS-NRI jobs – 3 team lead positions advertised
    • Paola would be well suited but has concerns about applying
    • ACCESS testing – Martin is pivotal, doco/process isn’t clear
    • One of the jobs is TL of Ocean modelling but that team has always existed, just doesn’t have a team lead
    • Secondments?
  • ACDG Cross-Inst data sharing report
    • Paola to send to Andy P, get his okay then send to Angela for publishing

15/2/2024 – weekly meeting 3? 2024

Paola, Gen, Claire

  • Regridded data for both CMIP and CORDEX are available in BoM project lp01
  • ACDG Cross-Institutional Data Sharing – Ethics approvals all done now and we’re good to publish the report
    • Good lesson in needs for approvals and processes!!
    • Chloe has also gone through the same process for her NCESS consultation
  • ACS reference data
    • CMORPH (was it?) data needed updating in ia39, had accidentally hardcoded the 00:00 time so needed to fix that. 
    • There’s a lot of files so will need concatenation when Paola has time
    • Tried to use the functional user on Gadi through github actions but it didn’t work. Need to check if the key is working.
  • ACDG Metadata portal
    • Need to add more records but otherwise going well
  • ACDG Governance book
    • Close to final now, should arrange another meeting
    • Tidy up ‘Create’ section and ‘Publishing’
  • ACDG Big Data
    • Paige is back in Australia now, we should spin this back up?

25/1/2024 – weekly meeting 1 2024

Thomas, Claire, Gen, Chloe. Apologies: Paola

  • 2024 is already busy – and tomorrow is a holiday!
  • AGCD consolidation on hold
  • Data backups
    • No backup strategy still for published CMIP5/6 data but there is a backup of the pre-CMORised CMIP6 data on CSIRO /datastore
    • xv83, ia39, hq89 has a lot of storage but it’s stretched and there’s no backup for any of it, and not all is urgently needed on disk
    • Stream data to tape when publishing to make a backup copy and delete from work disk
    • Also asked to publish Qld CCAM data through ACS storage
    • Initial inquiries with IM&T “Non-Standard Requests Team” quoted $330k p.a. to store PBs of data which seemed odd so we went to Joseph Antony, who confirmed we can indeed use /datastore – give feedback via Gareth maybe???
    • Follow up with Steve and Gareth established we can stream data in parallel to datastore using Globus after resolving some issues (first tests were about 1/10 the speed of the parallel rsync approach), Steve McMahon has been doing some testing and we are good to proceed with backing up CCAM data. Quoted 4PB in 5mo. Our goal was 1PB/3mo so this is good.
    • Figure out what to do with CMIP data
  • Gen not permanent yet so can’t rock the boat at BoM 🙂 
  • We are now hosted on WordPress as public Confluence access had to be removed
  • Intake 
    • Gen has made good headway in the last couple of weeks 
    • in lp01, some catalogue external stuff, some are the lp01 regridded output data.
    • This is Gen’s ACS focus but there is overlap with NESP
    • Cataloguing BARRA2, BARPA and CCAM data (from xv83) – but CCAM will move to hq89 next week for a few forcing datasets, will be equivalent to py18 for BARPA.
    • BARPA starting with historical for all models, CCAM starting with a few models (ERA5, ACCESS) for all scenarios
    • Gen skilling up to take over Francois’ data processing role
    • Intake catalogues do not replace good file structure and metadata!! Still needed and it’s needed just to build the catalogue too!
    • Claire – Kerchunk is sometimes useful too