Meeting notes 2025

2025 meeting notes and ongoing actions…

20/2/2025 – weekly meeting #7 2025

Damien, Claire, Sam, Hannes, Thomas. Apologies: Chloe

  • MERRA2
    • Claire had that request from within CSIRO looking for this data, currently unmaintained but Sam added the user to ua8 in this instance.
    • Sam found a lot of interest in W21C.
      • Aerosol
      • geoschem expts
      • comparison to ERA5
      • Renewable energy – new topic in this CoE
      • So much interest the CoE should really be hosting, but they have insufficient storage.
    • rr7 was RDSI, ua8 was CLEX. Both are no longer actively funded
      • Christian asking NCI to keep ua8 until the end of the year but it’s RO – but it’s ~300TB and 92% full!!
    • 56TB in rr7, 4TB in ua8
      • rr7 looks like raw data, ua8 seems to be processed products
      • Which data are the users actually using? Any chance we only need ua8??
    • Need to ensure users aren’t downloading any data themselves
    • W21C has 120TB – of which 20TB is allocated for reference data.
      • And started with only 10TB /scratch, up to 100TB now but seems to still be wildly insufficient.
    • Instead of inheriting CLEX data mess, start over which causes a whole lot of different problems.
      • CLEX was the best of the mess!!! Everyone else is worse!
    • Pawsey storage?
      • w97 was transitioned to Pawsey entirely.
      • Sharing data out of Pawsey seems tricky though? 
        • Michael Sumner doing some work on this – MVP?
        • John Reilley’s regional ocean model data is there but how to publish/share?
      • Thomas following EarthMover talks, still haven’t lined up Ryan A to talk to us yet.
    • NCI storage options outside of W21C
      • Need to understand exact volume needed
      • jt48 currently does not have enough storage, but may be able to shift more from ia39 or kj66 (the latter had way more storage allocated from ia39 than needed)
      • Hannes looking into options for getting the reference datasets into Geonetwork – need to know licence though – but “open” is okay
      • Each dataset has a page of metadata https://aus-ref-clim-data-nci.github.io/aus-ref-clim-data-nci/datasets/datasets-intro.html 
  • Data licensing
    • Is there anything that requires current users of code or data to update a usage file – e.g. do a PR to indicate you’re using it
    • Like a notification approach to attribution – you wouldn’t police it though.
    • Access tracking is desired by managers but really hard to implement (auth etc) without making the data less FAIR.
  • vn19
    • Thomas – Sam please have a look at vn19, is this appropriate for any of the W21C high resolution modelling?
    • Space for making datasets analysis-ready
  • Is nci-files-report timely? 
    • What to do if file ownership doesn’t seem to be accurate?

13/2/2025 – weekly meeting #6 2025

Claire, Damien, Sam, Hannes, Chloe. Apologies: Thomas

  • MERRA2
    • Interest in this dataset from with CSIRO (but not ACS)
    • Was replicated in rr7 and ua8 (rr7 ran out of space), Paola was hoping to move it somewhere else and combine but destination not determined
    • We don’t actually know who the users of MERRA2 are – not aware of any in 21CW or ACS
    • It’s about 60TB so don’t want to move to ia39 or similar without knowing it’s going to be used.
  • xv83
    • NCI can’t make an intake catalog as many of the directories are not readable except by the researchers
    • Need to ask people to catalogue their own data but that’s kind of the problem
    • Can use nci-files-report to identify large users, however the biggest volume users are not necessarily the problem users in this space.
  • jt48 replication
    • Service user is now in place
    • @Sam to redownload data into the new project (Update download scripts)
    • https://aus-ref-clim-data-nci.github.io/aus-ref-clim-data-nci/datasets/datasets-intro.html
    • @Damien to send comms to ia39 users telling them the new dataset is in place and to use that
    • After a month or two lock down permissions on the data in ia39 and create symlink to jt48
    • @Damien to send final comms about migration
    • Can delete after another month or two
      • Note because we’re redownloading the data this could make it a new dataset so need to be careful
    • @Claire to update records in OneClimate 
    • Can we add GeoNetwork entries even though we’re not publishing the data through NCI?
    • Lead CI/CIs for jt48? Chloe is currently Lead CI with a few CIs, but Sam should have a lot of power in this project – maybe make Sam and Damien Delegated LCIs?
  • US data emergency
    • Has not been raised at an NCI level
    • CSIRO have a storage space internally for critical data replication
    • Over time this will tell us what’s needed and what maybe should be held at NCI

6/2/2025 – weekly meeting #5 2025

Damien, Jo, Claire, Sam. Apologies: Chloe, Thomas

  • kj66 publishing
    • Damien has access to the metadata page
    • QC checks are happening now
    • THREDDS will be required
    • Intake catalogue is also of interest to users but less important/not a blocker than THREDDS
    • Could publish to the GeoNetwork catalogue prior to the data going live on THREDDS
    • Jo will confirm timelines with Yiling
    • Publishing contact (ie Damien) will keep edit access to the metadata page so can modify as needed if there’s errata or to clarify things for users in response to questions.
  • jt48 service user
    • We have movement on this ticket now
    • Jo – apologies we couldn’t see the ticket Jo opened on our behalf despite thinking permissions were set
    • Sam has supplied SSH key so should be set up in the coming days
  • CMIP7 planning
    • Claire, Michael Grose and Alberto Meucci attended a model selection workshop with a focus on CMIP7 and CORDEX this morning
    • What are the strategic plans with NCI for CMIP7?
      • ACCESS-NRI and NCI working on plans to support CMIP7
    • NDRI report out but not implemented (yet?) 
      • but nothing announced yet as it’s dependent on storage funding, which is dependent on budget and the federal election upcoming
      • Yiling and Andrew talking about contingency plans
      • Presumably following the election if the funding is approved there’ll then be an application process, realistically there is unlikely to be anything concrete before Q3 this year
    • CMIP7 timeline – first data releases expected as early as next year!
      • https://wcrp-cmip.org/cmip7/ 
      • What will Australia’s contribution be and how funded? ACCESS-ESM nearly ready but needs FTE to run/process. ACCESS-CM3 under development in collab’n with ACCESS-NRI but same problem with funding to run for CMIP. 
    • Concerns about whether there will even be US CMIP7 models?!
  • US data disappearing
    • Mauna Loa CO2 feed has stopped from NOAA, however Scripps are still publishing daily data at this point
    • Implications for other NOAA and NASA data?
    • NCAR are probably funded from gov’t grants?

30/1/2025 – weekly meeting #4 2025

Damien, Hannes, Claire, Jo, Sam

  • We should feed copilot all our scientific python code so it starts suggesting code that looks more like ours!
  • Moving reference data to jt48
    • Jo has escalated the ticket to create a service user
    • Sam will be the one handling the data downloads
    • We need to work out a plan for how to actually manage the migration in terms of comms to users and what gets moved vs what we redownload then delete.
  • kj66 requests pending (bias corrected CORDEX)
    • There’s people waiting on access to this project but it’s still in prep – Jo to inform them their requests will be processed once it’s ready
    • Metadata entry for NCI done, should go live in the next week
  • Researcher perception that data being on NCI does not automatically make it “public” or “published”!
    • W21C lead CI thought that the CLEX publication storage was provided by NCI, did not realise clex had paid for it.
    • NCRA folder in ia39 is underpinning data for National Climate Risk Assessment but that will need to move to a publication project
    • People like Claire and Sam act as intermediaries to make researchers aware of the general process for getting DOIs for their data
  • xv83 management
    • It would good to move CAFE into its own project as well to separate decadal forecast users from ACS internal work
    • Despite a huge volume and compute quota in this project it’s constantly running out due to 
  • Information about project usage
    • What qstat information is visible as a CI? 
      • By default you only see your own jobs. As a CI (or any user) you can do nci_account -P <project> -v to see summaries of people’s usage across the project but not number/size of jobs to see if quota is being used efficiently.
    • nci_files_account can be used to see who is responsible for storage usage, but it only gives totals which doesn’t accurately reflect who is using space *efficiently*
    • NCI could create a one-time Intake catalogue for the project so we could get a better picture of what the space is being used for – similar to what’s being done for publication projects but it can be useful for cleaning up space
      • BoM are using this process to tidy up projects that need to be decommissioned.
      • Hannes will follow up creating an Intake-spark catalogue for xv83 for us – thank you!

23/1/2025 – weekly meeting #3 2025

Claire, Damien, Thomas, Hannes

  • Hannes is back from parental leave!
  • ACS
    • Now we have Hannes back we can follow up jt48 hopefully
    • Damien being pulled into discussions about data availability for ACS…
    • Voila with jupterlab and Intake to create interactive maps of climate data
      • Proof of concept but in flux at the moment awaiting clear direction
      • Drawback is you need an NCI account to access it which won’t meet the stakeholder requirements
  • Object storage
    • Damien listened to Ryan Abernathy’s talk on Icechunk at AMS
    • Zarr is not yet considered a mature enough format for reliable long-term storage
    • When working with object storage you can’t make changes to it, you have to recall it and then commit changed objects/chunks as you would with git
    • Pawsey object storage can’t be computed directly against, you recall the data to scratch first, which is not how we expected to work with it? It has high speed connectivity to the HPC but the training indicated you don’t point to it directly?
  • NCI future direction unclear – NDRI report suggests uniting management with Pawsey? Maybe we’re waiting till after the election for a decision to be made.

9/1/2025 – weekly meeting #1 2025

Claire, Damien, Thomas

  • ACS
    • still no service user for jt48
    • ACS update – not being defunded in favour of a portal so that’s good news. Further clarification on funding for next FY yet to be announced.
  • NCI advertising for interim director at the moment (https://jobs.anu.edu.au/jobs/interim-director-canberra-act-act-australia
    • CSIRO and BoM plans ongoing but no decisions yet as far a we know.
  • Object storage
    • AMS meeting – big focus on putting all the earth systems data in the cloud. Long term is this a good strategy? (Damien)
    • Object storage is game changing
    • Giving all data to for-profit companies seems risky in the longer term
    • On-premises cloud to us looks like it has a much bigger potential particularly somewhere like Australia with centralised HPC facilities.
    • Ryan Abernathy (EarthMover) to talk at AMS
    • Acacia at Pawsey doesn’t seem to work the way we expected
      • Mike Sumner may have found a solution?
    • EarthMover etc don’t have “a solution” yet, it hasn’t coalesced into concrete advice of how to best do it.
    • Is zarr likely to become the official back end for netCDF? HDF5 cloud format? What is happening in netCDF land?
    • We need to go to Boulder and have beers with the few people who know these things 😉