Ecosystem identifier

D06

Ecosystem pillar

Data

Guideline group roles

Data ScientistProject Manager

Overview

Context should be taken into consideration during model selection to avoid or limit biased results for sub-populations. Caution should be taken in systems designed to use aggregated data about groups to predict individual behaviour as biased outcomes can occur. “Unintentional weightings of certain factors can cause algorithmic results that exacerbate and reinforce societal inequities,” for example, predicting educational performance based on an individual’s racial or ethnic identity.

Observed context drift in data should be documented via data transparency mechanisms capturing where and how the data is used and its appropriateness for that context. Harvard researchers have expanded the definition of data transparency, noting that some raw data sets are too sensitive to be released publicly, and incorporating guidance on development processes to reduce the risk of harmful and discriminatory impacts:

•  “In addition to releasing training and validation data sets whenever possible, agencies shall make publicly available summaries of relevant statistical properties of the data sets that can aid in interpreting the decisions made using the data, while applying state-of-the-art methods to preserve the privacy of individuals.

•  When appropriate, privacy-preserving synthetic data sets can be released in lieu of real data sets to expose certain features of the data if real data sets are sensitive and cannot be released to the public.” 

Teams should use transparency frameworks and independent standards; conduct and publish the results of independent audits; open non-sensitive data and source code to outside inspection.

Artificial Intelligence Ecosystem process diagram

A process diagram showing the application of Human, Data, Process, System and Governance elements to Diversity and Inclusion in Artificial Intelligence.

Artificial Intelligence Ecosystem process diagram