Multi-Level Co-Versioning

Summary: Multi-level co-versioning can capture the relationships and dependencies of AI system artifacts at different levels.

Type of pattern: Process pattern

Type of objective: Trustworthiness

Target users: Operators

Impacted stakeholders: Developers, AI users, AI consumers

Lifecycle stages: Operation

Relevant AI ethics principles: Human, societal and environmental wellbeing, human-centered values, fairness, privacy protection and security, reliability and safety, transparency and explainability, contestability, accountability

Mapping to AI regulations/standards: ISO/IEC 42001:2023 Standard.

Context: Compared with traditional software, AI systems evolve more frequently due to their dependence on data. This evolution results in various versions of AI system artifacts at different levels. At the system level, there can be multiple versions of AI components and non-AI components. At the supply chain level, there can be different versions of data, models, code, and configurations, which can be used to produce different versions of AI components.

Problem: How can we track the co-evolution of AI artifacts at different levels?

Solution: AI systems involve two levels of relationships and dependencies across various AI artifacts, including the supply chain level and system level. At the system level, AI components that embed AI models need to be integrated into AI systems and interact with non-AI components. At the supply chain level, the retraining of AI models introduces new versions of data, code, and configuration parameters. If federated learning is adopted, for each round of training, a global model is created based on local models sent from participating clients. It is important to capture all these dependencies throughout the development process by managing and tracking versions of various artifacts.

Benefits:

Traceability: Multi-level co-versioning provides end-to-end traceability throughout the whole lifecycle of AI systems.
Accountability: Multi-level co-versioning enables tracking of accountable roles and liability.

Drawbacks:

Increased cost: The collection and documentation of co-versioning information incur additional development cost.
Difficulty in capturing all dependencies: It is challenging to capture all the dependencies between different versions of AI artifacts, particularly if the system is complex and evolves frequently.

Related patterns:

Tight coupling of AI and non-AI development: The AI team and non-AI team can use a common co-versioning registry to manage the co-versioning of AI system artifacts.
Co-versioning registry: Co-versioning registry can be designed to capture the co-evolution of AI artifacts at different level.
Federated learner: Federated learning requires co-versioning of local models and global models.

Known uses:

MLflow Model Registry on Databricks is a model repository and set of APIs that enable management of the full lifecycle of MLflow Models, including model lineage and versioning.
Amazon uses a tool for automatically tracking metadata and provenance of AI model training and experiments.
Data Version Control (DVC) manages co-versioning of data and machine learning models.