Meeting notes 2025

2025 meeting notes and ongoing actions…

27/3/2025 – weekly meeting #12 2025

Hannes, Thomas, Zoe, Claire, Paola, Damien

  • Paola has started at NCI as a casual to clean up data collections!
    • Focus on CLEX and BoM disused collections – finding time to decommission projects.
    • Learning which projects are needed and processes
    • Will be working with Hannes 🙂
    • Hopefully scope for writing up process documentation (retiring datasets) in the Aus Clim Data Guide
      • Has started a checklist for retiring data
  • Updates from last few months: funding issues with BoM and CSIRO on NCI systems, concerns with governance and infrastructure funding
  • NCI interim director Andrew Rohl started Monday, opening speech talked a lot about the value of data 🙂 

20/3/2025 – weekly meeting #11 2025

Jo, Sam, Damien, Zoe, Thomas, Claire (late)

  • NCI interim director starting next week
  • No update on HPC budgets yet/NDRI
  • Intake catalogues
    • Thomas has a specific problem he’s working through with Jo
    • Zoe has taken over the work Gen did previously on intake (also working on analysis of CMIP7 fast track ESMValTool etc)
    • Thomas will share earlier CMIP Intake work he did: https://github.com/AusClimateService/data-catalogue
  • Access to any amount of object storage on NCI would be helpful for PoC testing

13/3/2025 – weekly meeting #10 2025

Hannes, Damien, Claire, Sam, Thomas

  • NCI funding
    • CSIRO IM&T reforms include a “reduction in Tier 1 HPC investment” – no clarity on how big that reduction might be, but allegedly climate will be spared
    • BoM are also reducing NCI investment
    • Where does this leave published datasets??
    • The National Collections scheme supports some reference data collections, BoM is currently overrepresented in that scheme but want to also use it for BARRA-v2, BARPA etc.
    • NDRI recommended $100m spend on new HPC infrastructure – has that been approved yet or waiting on Federal budget?
    • NCRIS funds NCI infrastructure but partners fund salaries etc.
    • Are NCI themselves aware of decisions being made by the partners and able to make contributions to these decisions? Current management flux at NCI means probably not but will be good to see what comms look like under incoming interim director Andrew Rohl (starting in a couple of weeks)
  • MERRA2
    • No more people other than the 1 wanting the rr7 version so Sam will just keep focussing on the ua8 version and porting that to jt48.
  • Aust-clim-ref
    • TIGGE will be moving to a new distribution through Copernicus, will be used in ACS but a new downloader will be set up once the new CDS comes online. For now it’s static and being used for testing.
    • Sam is working through porting the downloaders/collections to jt48, no issues have arisen so far.
  • Open data requirement for journals
    • What do you do when a journal requires underpinning data to be made available open access
    • CLEX had a large publishing project specifically for this case
    • CSIRO staff can/do use DAP, may make the data “available on request” so just the metadata is public
    • Some scientists say it’s not possible as a way of not supporting the open science culture
    • Just include the specific data in the plots
    • FAIR data push is important, but ensuring you’re references are pointing to the right datasets/versions/etc is incredibly difficult. Sometimes paper plots are unreproducible because they’re not referring to the data they think they are.
    • Provenance is hard but necessary.

6/3/2025 – weekly meeting #9 2025

Claire, Sam, Thomas, Hannes, Zoe Gillet. Apologies: Chloe, Damien

  • HYCOM
    • There is salinity data in ua8 but it was a one-off download, not managed
    • Looks like there is some use in ACCESS-NRI going by Hive posts, so maybe there’s a few unofficial copies around
  • Discussion of TC Alfred
  • Welcome Zoe!
    • Zoe Gillet has recently started work at BoM with Francois’ team
    • Will be working on ESMValTool and climate data analytics
    • Previous postdoc at UNSW in CLEX
  • AI/ML-based data compression
    • Has come up in W21C, is this being talked about in NCI or other orgs?
    • Call from ACCESS-NRI ML WG for AI compression for climate simulations
    • Space savings vs time investment in doing it?
    • Risks associated with inconsistent compression per file??
  • Sam has a puppy!! Snoopy is an adorable 8mo sausage dog!
  • Object storage talk
    • Things are happening
    • Michael Sumner (AAD) has played with it and is working with it in a PoC way
    • Talking with ACCESS-NRI
    • Have met with EarthMover people – Michael is doing most of the work and communications
    • Jo has been in touch with Thomas
    • Thomas to make sure Michael keeps this group in the loop with discussions
    • Virtualising datasets
    • Managing metadata and cataloguing
    • Version control for zarr collections
    • Object store backing Nirin cloud at NCI is pretty big – PB scale! but not accessible for general data storage
  • NCRIS infrastructure survey
  • MERRA2
    • Sam will move the ua8 component (the smaller part) to jt48 as a managed dataset
    • Found one user of the rr7 data and there may well be others, but for now we’ll just focus on uplifting the MERRA2 data from ua8.
  • W21C storage
    • 1.2MSU every quarter for big model runs – but you don’t want to just delete that output every time you run it, you still need storage even if data *can* be regenerated by rerunning models.
    • People need datasets downloaded to force models or compare outputs, but storage for downloads simply isn’t available
    • No traction yet in the need for more storage
    • Are our models bitwise reproducible? Are you guaranteed of getting the same results if you rerun the model with the same ICs every time?
    • Sam would be very happy to inherit the CLEX storage and delete half of what’s left there from researchers who have left 🙂 

27/2/2025 – weekly meeting #8 2025

Jo, Thomas, Alicia, Claire, Sam, Damien. Apologies: Chloe, Hannes

  • NCI Interim Director will be Andrew Rohl (previously of iVEC/Pawsey)
    • General high levels of hope and excitement
    • Believer in data DOIs
    • Starting March 20th? 24th?
  • MERRA2
    • The only people who’ve responded again to Sam are only using the data in ua8
    • This data is only 3-4TB 
    • jt48 has a grant of 40TB
    • So we could add the ua8 component of MERRA2 to the list of things to manage in jt48
    • Sam will add this to his list.
    • No updates to report yet but should be well underway by next week
    • Sam filled up the W21C storage, centre are aware they need more allocation, but NCI quoted $100k for 100TB for the lift of the CoE, which is a lot (~1 postdoc)
  • IceChunk
    • Michael Sumner (AAD) has been playing with this on Acacia (Pawesey) and having some success.
    • Who at NCI would be the right person to see a show & tell on doing cool stuff with object storage.
    • Delivery of derived products
    • Ryan has limited availability to give talks, need to figure it out for ourselves
    • ACCESS-NRI may also be interested
    • Thomas doesn’t have time to look into this but keen to connect people with interest in this space.
  • nci-files-report – Sam will follow up
  • ESGF node future
    • Can’t say anything until the new centre director comes online and outcomes of NDRI following the upcoming election
    • Strong commitment within NCI to continue to support the local node though if possible
    • Funding applications expected to support the node going forward.
    • NCI receiving requests for NIH genomics data replication

20/2/2025 – weekly meeting #7 2025

Damien, Claire, Sam, Hannes, Thomas. Apologies: Chloe

  • MERRA2
    • Claire had that request from within CSIRO looking for this data, currently unmaintained but Sam added the user to ua8 in this instance.
    • Sam found a lot of interest in W21C.
      • Aerosol
      • geoschem expts
      • comparison to ERA5
      • Renewable energy – new topic in this CoE
      • So much interest the CoE should really be hosting, but they have insufficient storage.
    • rr7 was RDSI, ua8 was CLEX. Both are no longer actively funded
      • Christian asking NCI to keep ua8 until the end of the year but it’s RO – but it’s ~300TB and 92% full!!
    • 56TB in rr7, 4TB in ua8
      • rr7 looks like raw data, ua8 seems to be processed products
      • Which data are the users actually using? Any chance we only need ua8??
    • Need to ensure users aren’t downloading any data themselves
    • W21C has 120TB – of which 20TB is allocated for reference data.
      • And started with only 10TB /scratch, up to 100TB now but seems to still be wildly insufficient.
    • Instead of inheriting CLEX data mess, start over which causes a whole lot of different problems.
      • CLEX was the best of the mess!!! Everyone else is worse!
    • Pawsey storage?
      • w97 was transitioned to Pawsey entirely.
      • Sharing data out of Pawsey seems tricky though? 
        • Michael Sumner doing some work on this – MVP?
        • John Reilley’s regional ocean model data is there but how to publish/share?
      • Thomas following EarthMover talks, still haven’t lined up Ryan A to talk to us yet.
    • NCI storage options outside of W21C
      • Need to understand exact volume needed
      • jt48 currently does not have enough storage, but may be able to shift more from ia39 or kj66 (the latter had way more storage allocated from ia39 than needed)
      • Hannes looking into options for getting the reference datasets into Geonetwork – need to know licence though – but “open” is okay
      • Each dataset has a page of metadata https://aus-ref-clim-data-nci.github.io/aus-ref-clim-data-nci/datasets/datasets-intro.html 
  • Data licensing
    • Is there anything that requires current users of code or data to update a usage file – e.g. do a PR to indicate you’re using it
    • Like a notification approach to attribution – you wouldn’t police it though.
    • Access tracking is desired by managers but really hard to implement (auth etc) without making the data less FAIR.
  • vn19
    • Thomas – Sam please have a look at vn19, is this appropriate for any of the W21C high resolution modelling?
    • Space for making datasets analysis-ready
  • Is nci-files-report timely? 
    • What to do if file ownership doesn’t seem to be accurate?

13/2/2025 – weekly meeting #6 2025

Claire, Damien, Sam, Hannes, Chloe. Apologies: Thomas

  • MERRA2
    • Interest in this dataset from with CSIRO (but not ACS)
    • Was replicated in rr7 and ua8 (rr7 ran out of space), Paola was hoping to move it somewhere else and combine but destination not determined
    • We don’t actually know who the users of MERRA2 are – not aware of any in 21CW or ACS
    • It’s about 60TB so don’t want to move to ia39 or similar without knowing it’s going to be used.
  • xv83
    • NCI can’t make an intake catalog as many of the directories are not readable except by the researchers
    • Need to ask people to catalogue their own data but that’s kind of the problem
    • Can use nci-files-report to identify large users, however the biggest volume users are not necessarily the problem users in this space.
  • jt48 replication
    • Service user is now in place
    • @Sam to redownload data into the new project (Update download scripts)
    • https://aus-ref-clim-data-nci.github.io/aus-ref-clim-data-nci/datasets/datasets-intro.html
    • @Damien to send comms to ia39 users telling them the new dataset is in place and to use that
    • After a month or two lock down permissions on the data in ia39 and create symlink to jt48
    • @Damien to send final comms about migration
    • Can delete after another month or two
      • Note because we’re redownloading the data this could make it a new dataset so need to be careful
    • @Claire to update records in OneClimate 
    • Can we add GeoNetwork entries even though we’re not publishing the data through NCI?
    • Lead CI/CIs for jt48? Chloe is currently Lead CI with a few CIs, but Sam should have a lot of power in this project – maybe make Sam and Damien Delegated LCIs?
  • US data emergency
    • Has not been raised at an NCI level
    • CSIRO have a storage space internally for critical data replication
    • Over time this will tell us what’s needed and what maybe should be held at NCI

6/2/2025 – weekly meeting #5 2025

Damien, Jo, Claire, Sam. Apologies: Chloe, Thomas

  • kj66 publishing
    • Damien has access to the metadata page
    • QC checks are happening now
    • THREDDS will be required
    • Intake catalogue is also of interest to users but less important/not a blocker than THREDDS
    • Could publish to the GeoNetwork catalogue prior to the data going live on THREDDS
    • Jo will confirm timelines with Yiling
    • Publishing contact (ie Damien) will keep edit access to the metadata page so can modify as needed if there’s errata or to clarify things for users in response to questions.
  • jt48 service user
    • We have movement on this ticket now
    • Jo – apologies we couldn’t see the ticket Jo opened on our behalf despite thinking permissions were set
    • Sam has supplied SSH key so should be set up in the coming days
  • CMIP7 planning
    • Claire, Michael Grose and Alberto Meucci attended a model selection workshop with a focus on CMIP7 and CORDEX this morning
    • What are the strategic plans with NCI for CMIP7?
      • ACCESS-NRI and NCI working on plans to support CMIP7
    • NDRI report out but not implemented (yet?) 
      • but nothing announced yet as it’s dependent on storage funding, which is dependent on budget and the federal election upcoming
      • Yiling and Andrew talking about contingency plans
      • Presumably following the election if the funding is approved there’ll then be an application process, realistically there is unlikely to be anything concrete before Q3 this year
    • CMIP7 timeline – first data releases expected as early as next year!
      • https://wcrp-cmip.org/cmip7/ 
      • What will Australia’s contribution be and how funded? ACCESS-ESM nearly ready but needs FTE to run/process. ACCESS-CM3 under development in collab’n with ACCESS-NRI but same problem with funding to run for CMIP. 
    • Concerns about whether there will even be US CMIP7 models?!
  • US data disappearing
    • Mauna Loa CO2 feed has stopped from NOAA, however Scripps are still publishing daily data at this point
    • Implications for other NOAA and NASA data?
    • NCAR are probably funded from gov’t grants?

30/1/2025 – weekly meeting #4 2025

Damien, Hannes, Claire, Jo, Sam

  • We should feed copilot all our scientific python code so it starts suggesting code that looks more like ours!
  • Moving reference data to jt48
    • Jo has escalated the ticket to create a service user
    • Sam will be the one handling the data downloads
    • We need to work out a plan for how to actually manage the migration in terms of comms to users and what gets moved vs what we redownload then delete.
  • kj66 requests pending (bias corrected CORDEX)
    • There’s people waiting on access to this project but it’s still in prep – Jo to inform them their requests will be processed once it’s ready
    • Metadata entry for NCI done, should go live in the next week
  • Researcher perception that data being on NCI does not automatically make it “public” or “published”!
    • W21C lead CI thought that the CLEX publication storage was provided by NCI, did not realise clex had paid for it.
    • NCRA folder in ia39 is underpinning data for National Climate Risk Assessment but that will need to move to a publication project
    • People like Claire and Sam act as intermediaries to make researchers aware of the general process for getting DOIs for their data
  • xv83 management
    • It would good to move CAFE into its own project as well to separate decadal forecast users from ACS internal work
    • Despite a huge volume and compute quota in this project it’s constantly running out due to 
  • Information about project usage
    • What qstat information is visible as a CI? 
      • By default you only see your own jobs. As a CI (or any user) you can do nci_account -P <project> -v to see summaries of people’s usage across the project but not number/size of jobs to see if quota is being used efficiently.
    • nci_files_account can be used to see who is responsible for storage usage, but it only gives totals which doesn’t accurately reflect who is using space *efficiently*
    • NCI could create a one-time Intake catalogue for the project so we could get a better picture of what the space is being used for – similar to what’s being done for publication projects but it can be useful for cleaning up space
      • BoM are using this process to tidy up projects that need to be decommissioned.
      • Hannes will follow up creating an Intake-spark catalogue for xv83 for us – thank you!

23/1/2025 – weekly meeting #3 2025

Claire, Damien, Thomas, Hannes

  • Hannes is back from parental leave!
  • ACS
    • Now we have Hannes back we can follow up jt48 hopefully
    • Damien being pulled into discussions about data availability for ACS…
    • Voila with jupterlab and Intake to create interactive maps of climate data
      • Proof of concept but in flux at the moment awaiting clear direction
      • Drawback is you need an NCI account to access it which won’t meet the stakeholder requirements
  • Object storage
    • Damien listened to Ryan Abernathy’s talk on Icechunk at AMS
    • Zarr is not yet considered a mature enough format for reliable long-term storage
    • When working with object storage you can’t make changes to it, you have to recall it and then commit changed objects/chunks as you would with git
    • Pawsey object storage can’t be computed directly against, you recall the data to scratch first, which is not how we expected to work with it? It has high speed connectivity to the HPC but the training indicated you don’t point to it directly?
  • NCI future direction unclear – NDRI report suggests uniting management with Pawsey? Maybe we’re waiting till after the election for a decision to be made.

9/1/2025 – weekly meeting #1 2025

Claire, Damien, Thomas

  • ACS
    • still no service user for jt48
    • ACS update – not being defunded in favour of a portal so that’s good news. Further clarification on funding for next FY yet to be announced.
  • NCI advertising for interim director at the moment (https://jobs.anu.edu.au/jobs/interim-director-canberra-act-act-australia
    • CSIRO and BoM plans ongoing but no decisions yet as far a we know.
  • Object storage
    • AMS meeting – big focus on putting all the earth systems data in the cloud. Long term is this a good strategy? (Damien)
    • Object storage is game changing
    • Giving all data to for-profit companies seems risky in the longer term
    • On-premises cloud to us looks like it has a much bigger potential particularly somewhere like Australia with centralised HPC facilities.
    • Ryan Abernathy (EarthMover) to talk at AMS
    • Acacia at Pawsey doesn’t seem to work the way we expected
      • Mike Sumner may have found a solution?
    • EarthMover etc don’t have “a solution” yet, it hasn’t coalesced into concrete advice of how to best do it.
    • Is zarr likely to become the official back end for netCDF? HDF5 cloud format? What is happening in netCDF land?
    • We need to go to Boulder and have beers with the few people who know these things 😉