Meeting notes 2025
2025 meeting notes and ongoing actions…
20/2/2025 – weekly meeting #7 2025
Damien, Claire, Sam, Hannes, Thomas. Apologies: Chloe
- MERRA2
- Claire had that request from within CSIRO looking for this data, currently unmaintained but Sam added the user to
ua8
in this instance. - Sam found a lot of interest in W21C.
- Aerosol
- geoschem expts
- comparison to ERA5
- Renewable energy – new topic in this CoE
- So much interest the CoE should really be hosting, but they have insufficient storage.
rr7
was RDSI,ua8
was CLEX. Both are no longer actively funded- Christian asking NCI to keep
ua8
until the end of the year but it’s RO – but it’s ~300TB and 92% full!!
- Christian asking NCI to keep
- 56TB in
rr7
, 4TB inua8
rr7
looks like raw data,ua8
seems to be processed products- Which data are the users actually using? Any chance we only need
ua8
??
- Need to ensure users aren’t downloading any data themselves
- W21C has 120TB – of which 20TB is allocated for reference data.
- And started with only 10TB
/scratch
, up to 100TB now but seems to still be wildly insufficient.
- And started with only 10TB
- Instead of inheriting CLEX data mess, start over which causes a whole lot of different problems.
- CLEX was the best of the mess!!! Everyone else is worse!
- Pawsey storage?
- w97 was transitioned to Pawsey entirely.
- Sharing data out of Pawsey seems tricky though?
- Michael Sumner doing some work on this – MVP?
- John Reilley’s regional ocean model data is there but how to publish/share?
- Thomas following EarthMover talks, still haven’t lined up Ryan A to talk to us yet.
- NCI storage options outside of W21C
- Need to understand exact volume needed
jt48
currently does not have enough storage, but may be able to shift more fromia39
orkj66
(the latter had way more storage allocated from ia39 than needed)- Hannes looking into options for getting the reference datasets into Geonetwork – need to know licence though – but “open” is okay
- Each dataset has a page of metadata https://aus-ref-clim-data-nci.github.io/aus-ref-clim-data-nci/datasets/datasets-intro.html
- Claire had that request from within CSIRO looking for this data, currently unmaintained but Sam added the user to
- Data licensing
- Is there anything that requires current users of code or data to update a usage file – e.g. do a PR to indicate you’re using it
- Like a notification approach to attribution – you wouldn’t police it though.
- Access tracking is desired by managers but really hard to implement (auth etc) without making the data less FAIR.
vn19
- Thomas – Sam please have a look at
vn19
, is this appropriate for any of the W21C high resolution modelling? - Space for making datasets analysis-ready
- Thomas – Sam please have a look at
- Is nci-files-report timely?
- What to do if file ownership doesn’t seem to be accurate?
13/2/2025 – weekly meeting #6 2025
Claire, Damien, Sam, Hannes, Chloe. Apologies: Thomas
- MERRA2
- Interest in this dataset from with CSIRO (but not ACS)
- Was replicated in
rr7
andua8
(rr7
ran out of space), Paola was hoping to move it somewhere else and combine but destination not determined - We don’t actually know who the users of MERRA2 are – not aware of any in 21CW or ACS
- It’s about 60TB so don’t want to move to
ia39
or similar without knowing it’s going to be used.
xv83
- NCI can’t make an intake catalog as many of the directories are not readable except by the researchers
- Need to ask people to catalogue their own data but that’s kind of the problem
- Can use
nci-files-report
to identify large users, however the biggest volume users are not necessarily the problem users in this space.
jt48
replication- Service user is now in place
- @Sam to redownload data into the new project (Update download scripts)
- https://aus-ref-clim-data-nci.github.io/aus-ref-clim-data-nci/datasets/datasets-intro.html
- @Damien to send comms to ia39 users telling them the new dataset is in place and to use that
- After a month or two lock down permissions on the data in
ia39
and create symlink tojt48
- @Damien to send final comms about migration
- Can delete after another month or two
- Note because we’re redownloading the data this could make it a new dataset so need to be careful
- @Claire to update records in OneClimate
- Can we add GeoNetwork entries even though we’re not publishing the data through NCI?
- Lead CI/CIs for
jt48
? Chloe is currently Lead CI with a few CIs, but Sam should have a lot of power in this project – maybe make Sam and Damien Delegated LCIs?
- US data emergency
- Has not been raised at an NCI level
- CSIRO have a storage space internally for critical data replication
- Over time this will tell us what’s needed and what maybe should be held at NCI
6/2/2025 – weekly meeting #5 2025
Damien, Jo, Claire, Sam. Apologies: Chloe, Thomas
kj66
publishing- Damien has access to the metadata page
- QC checks are happening now
- THREDDS will be required
- Intake catalogue is also of interest to users but less important/not a blocker than THREDDS
- Could publish to the GeoNetwork catalogue prior to the data going live on THREDDS
- Jo will confirm timelines with Yiling
- Publishing contact (ie Damien) will keep edit access to the metadata page so can modify as needed if there’s errata or to clarify things for users in response to questions.
jt48
service user- We have movement on this ticket now
- Jo – apologies we couldn’t see the ticket Jo opened on our behalf despite thinking permissions were set
- Sam has supplied SSH key so should be set up in the coming days
- CMIP7 planning
- Claire, Michael Grose and Alberto Meucci attended a model selection workshop with a focus on CMIP7 and CORDEX this morning
- What are the strategic plans with NCI for CMIP7?
- ACCESS-NRI and NCI working on plans to support CMIP7
- NDRI report out but not implemented (yet?)
- but nothing announced yet as it’s dependent on storage funding, which is dependent on budget and the federal election upcoming
- Yiling and Andrew talking about contingency plans
- Presumably following the election if the funding is approved there’ll then be an application process, realistically there is unlikely to be anything concrete before Q3 this year
- CMIP7 timeline – first data releases expected as early as next year!
- https://wcrp-cmip.org/cmip7/
- What will Australia’s contribution be and how funded? ACCESS-ESM nearly ready but needs FTE to run/process. ACCESS-CM3 under development in collab’n with ACCESS-NRI but same problem with funding to run for CMIP.
- Concerns about whether there will even be US CMIP7 models?!
- US data disappearing
- Mauna Loa CO2 feed has stopped from NOAA, however Scripps are still publishing daily data at this point
- Implications for other NOAA and NASA data?
- NCAR are probably funded from gov’t grants?
30/1/2025 – weekly meeting #4 2025
Damien, Hannes, Claire, Jo, Sam
- We should feed copilot all our scientific python code so it starts suggesting code that looks more like ours!
- Moving reference data to
jt48
- Jo has escalated the ticket to create a service user
- Sam will be the one handling the data downloads
- We need to work out a plan for how to actually manage the migration in terms of comms to users and what gets moved vs what we redownload then delete.
kj66
requests pending (bias corrected CORDEX)- There’s people waiting on access to this project but it’s still in prep – Jo to inform them their requests will be processed once it’s ready
- Metadata entry for NCI done, should go live in the next week
- Researcher perception that data being on NCI does not automatically make it “public” or “published”!
- W21C lead CI thought that the CLEX publication storage was provided by NCI, did not realise clex had paid for it.
- NCRA folder in
ia39
is underpinning data for National Climate Risk Assessment but that will need to move to a publication project - People like Claire and Sam act as intermediaries to make researchers aware of the general process for getting DOIs for their data
xv83
management- It would good to move CAFE into its own project as well to separate decadal forecast users from ACS internal work
- Despite a huge volume and compute quota in this project it’s constantly running out due to
- Information about project usage
- What qstat information is visible as a CI?
- By default you only see your own jobs. As a CI (or any user) you can do
nci_account -P <project> -v
to see summaries of people’s usage across the project but not number/size of jobs to see if quota is being used efficiently.
- By default you only see your own jobs. As a CI (or any user) you can do
nci_files_account
can be used to see who is responsible for storage usage, but it only gives totals which doesn’t accurately reflect who is using space *efficiently*- NCI could create a one-time Intake catalogue for the project so we could get a better picture of what the space is being used for – similar to what’s being done for publication projects but it can be useful for cleaning up space
- BoM are using this process to tidy up projects that need to be decommissioned.
- Hannes will follow up creating an Intake-spark catalogue for
xv83
for us – thank you!
- What qstat information is visible as a CI?
23/1/2025 – weekly meeting #3 2025
Claire, Damien, Thomas, Hannes
- Hannes is back from parental leave!
- ACS
- Now we have Hannes back we can follow up
jt48
hopefully - Damien being pulled into discussions about data availability for ACS…
- Voila with jupterlab and Intake to create interactive maps of climate data
- Proof of concept but in flux at the moment awaiting clear direction
- Drawback is you need an NCI account to access it which won’t meet the stakeholder requirements
- Now we have Hannes back we can follow up
- Object storage
- Damien listened to Ryan Abernathy’s talk on Icechunk at AMS
- Zarr is not yet considered a mature enough format for reliable long-term storage
- When working with object storage you can’t make changes to it, you have to recall it and then commit changed objects/chunks as you would with git
- Pawsey object storage can’t be computed directly against, you recall the data to scratch first, which is not how we expected to work with it? It has high speed connectivity to the HPC but the training indicated you don’t point to it directly?
- NCI future direction unclear – NDRI report suggests uniting management with Pawsey? Maybe we’re waiting till after the election for a decision to be made.
9/1/2025 – weekly meeting #1 2025
Claire, Damien, Thomas
- ACS
- still no service user for
jt48
- ACS update – not being defunded in favour of a portal so that’s good news. Further clarification on funding for next FY yet to be announced.
- still no service user for
- NCI advertising for interim director at the moment (https://jobs.anu.edu.au/jobs/interim-director-canberra-act-act-australia)
- CSIRO and BoM plans ongoing but no decisions yet as far a we know.
- Object storage
- AMS meeting – big focus on putting all the earth systems data in the cloud. Long term is this a good strategy? (Damien)
- https://ams.confex.com/ams/105ANNUAL/meetingapp.cgi/Paper/458101
- Pretty likely in Big Tech interests there!
- Object storage is game changing
- Giving all data to for-profit companies seems risky in the longer term
- On-premises cloud to us looks like it has a much bigger potential particularly somewhere like Australia with centralised HPC facilities.
- Ryan Abernathy (EarthMover) to talk at AMS
- Acacia at Pawsey doesn’t seem to work the way we expected
- Mike Sumner may have found a solution?
- EarthMover etc don’t have “a solution” yet, it hasn’t coalesced into concrete advice of how to best do it.
- Is zarr likely to become the official back end for netCDF? HDF5 cloud format? What is happening in netCDF land?
- We need to go to Boulder and have beers with the few people who know these things 😉
- AMS meeting – big focus on putting all the earth systems data in the cloud. Long term is this a good strategy? (Damien)