Co-Versioning Registry
Summary: The co-versioning registry can track the co-evolution of components or AI artifacts at different levels.
Type of pattern: Product pattern
Type of objective: Trustworthiness, trust
Target users: Architects, developers
Impacted stakeholders: Operators, developers, data scientists
Relevant AI ethics principles: Transparency and explainability, accountability
Mapping to AI regulations/standards: EU AI Act, ISO/IEC 42001:2023 Standard.
Context: Compared with traditional software systems, AI systems involve different levels of dependencies across artifacts, including datasets, AI models, AI components, and non-AI components that interact with the AI components. AI systems also, in general, evolve more frequently due to their data-dependent behaviors.
Problem: How can we capture the relationships and dependencies among system components and AI artifacts of AI systems?
Solution: Compared with traditional software, AI systems involve different levels of dependencies and may evolve more frequently due to their data-dependent behaviors. From the viewpoint of the AI system, it is important to know the version of the AI component integrated into the system. From the viewpoint of the AI component, it is important to know what datasets and parameters were used to train the AI model and what data was used to evaluate the AI model.
Co-versioning of the components or AI artifacts of AI systems provides end-to-end provenance guarantees across the entire lifecycle of AI systems. As shown in Figure 6.4, a co-versioning registry can track the co-evolution of software components and/or AI artifacts. There are different levels of co-versioning: co-versioning of AI components and non-AI components, co-versioning of the artifacts within the AI components (i.e., co-versioning of data, model, code, configurations), and co-versioning of local models and global models in federated learning. Co-versioning enables effective maintenance and evolution of AI components because the deployed model or code can be traced to the exact set of artifacts, parameters, and metadata that were used to develop the model and code.
Benefits:
- Traceability and accountability: Co-versioning at different levels of AI systems, including AI artifacts and/or software components provides end-to-end provenance across the entire lifecycle of AI systems.
Drawbacks:
- Complexity: Depending on the granularity of the information documented, co-versioning might largely increase the complexity of provenance data and the corresponding query function.
Related patterns:
- Federated learner: Co-versioning registry could be applied to federated learner for co-versioning of local models and global models.
- Tight coupling of AI and Non-AI Development: The AI team and non-AI team can use a common co-versioning registry to manage the co-versioning of AI components and non-AI components. Also, the AI team can build up a co-versioning registry to track the co-versioning of AI artifacts, including data, model, code, and configurations.
- Multi-level co-versioning: Co-versioning registry can be designed to capture the co-evolution of AI artifacts at different level.
Known uses:
- MLflow model registry on Databricks is a model repository and set of APIs that enable management of the full lifecycle of MLflow models, including model lineage and versioning.
- Amazon uses a tool for automatically tracking metadata and provenance of AI model training and experiments.
- Data Version Control (DVC) is a data and ML experiment management tool that takes advantage of the existing engineering toolset (e.g., Git, CI/CD, etc.).
- Pachyderm’s data lineage is an immutable record for all activities and assets in the AI lifecycle, including tracking every version of the code, models, and data.
- Verta enterprise MLOps platform supports state-of-the-art experiment tracking, model reproducibility, dataset versioning, and model meta-data visualization capabilities to ensure model reproducibility and quality from experiment to production.