Identify possible systemic problems of bias and appoint a steward
December 22, 2023
At the start of the Pre-Design stage, stakeholders should identify possible systemic problems of bias such as racism, sexism, or ageism that have implications for diversity and inclusion. Main decision-makers and power holders should be identified, as this can reflect systemic biases and limited viewpoints within the organisation. A sole person responsible for algorithmic bias ̶ […]
Improve feature-based labelling and formulate more precise notions about user identity using qualitative data from social media sources
December 22, 2023
Apply more inclusive and socially just data labelling methodologies such as Intersectional Labeling Methodology to address gender bias. Rather than relying on static, binary gender in a face classification infrastructure, application designers should embrace and demand improvements, to feature-based labelling. For instance, labels based on neutral performative markers (e.g., beard, makeup, dress) could replace gender classification in the facial analysis model, allowing third parties and individuals who come into contact with facial analysis applications to embrace their own interpretations of those features. Instead of focusing on improving methods of gender classification, application designers could use labelling alongside other qualitative data such as Instagram captions to formulate more precise notions about user identity.
Document social descriptors when scraping data from different sources and perform compatibility analysis
December 22, 2023
Developers should attend to and document the social descriptors (for example, age, gender, and geolocation) when scraping data from different sources including websites, databases, social media platforms, enterprise applications, or legacy systems. Context is important when the same data is later used for different purposes such as asking a new question about an existing data set. A compatibility analysis should be performed to ensure that potential sources of bias are identified, and mitigation plans made. This analysis would capture context shifts in new uses of data sets, identifying whether or how these could produce specific bias issues.
Assess dataset suitability factors
December 22, 2023
Dataset suitability factors should be assessed. This includes statistical methods for mitigating representation issues, the socio-technical context of deployment, and interaction of human factors with the AI system. The question of whether suitable datasets exist that fit the purpose of the various applications, domains, and tasks for the planned AI system should be asked.
Consider context issues and context drift during model selection and development
December 22, 2023
Context should be taken into consideration during model selection to avoid or limit biased results for sub-populations. Caution should be taken in systems designed to use aggregated data about groups to predict individual behaviour as biased outcomes can occur. “Unintentional weightings of certain factors can cause algorithmic results that exacerbate and reinforce societal inequities,” for example, predicting educational performance based on an individual’s racial or ethnic identity. Observed context drift in data should be documented via data transparency mechanisms capturing where and how the data is used and its appropriateness for that context. Harvard researchers have expanded the definition of data transparency, noting that some raw data sets are too sensitive to be released publicly, and incorporating guidance on development processes to reduce the risk of harmful and discriminatory impacts: • “In addition to releasing training and validation data sets whenever possible, agencies shall make publicly available summaries of relevant statistical properties of the data sets that can aid in interpreting the decisions made using the data, while applying state-of-the-art methods to preserve the privacy of individuals. • When appropriate, privacy-preserving synthetic data sets can be released in lieu of real data sets to expose certain features of the data if real data sets are sensitive and cannot be released to the public.” Teams should use transparency frameworks and independent standards; conduct and publish the results of independent audits; open non-sensitive data and source code to outside inspection.
Recognize relationships between access issues, infrastructure, capacity building, and data sovereignty
December 22, 2023
Access, including cloud and offline data hosting, should be attended to because government and industry generally build and manage these on their own terms. Access is directly connected to capacity building (teams and stakeholders) and data sovereignty issues.
Understand and adhere to data sovereignty praxis
December 22, 2023
The concept of, and practices supporting, data sovereignty is a critical element in the AI ecosystem. It covers considerations of the “use, management and ownership of AI to house, analyze and disseminate valuable or sensitive data”. Although definitions are context-dependent, operationally data sovereignty refers to stakeholders within an AI ecosystem, ad other relevant representatives from outside stakeholder cohorts to be included as partners throughout the AI-LC. Data sovereignty should be explored from and with the perspectives of those whose data is being used. These alternative and diverse perspectives can be captured and fed back into AI Literacy programs, exemplifying how people can affect and enrich AI both conceptually and materially. Various Indigenous technologists, researchers, artists, and activists have progressed the concept of, and protocols for, Indigenous data sovereignty in AI. This involves “Indigenous control over the protection and use of data that is collected from our communities, including statistics, cultural knowledge and even user data,” and moving beyond the representation of impacted users to “maximising the generative capacity of truly diverse groups.”
Establish clear procedures for ensuring data privacy and offering opt-out options
December 22, 2023
Data privacy should be at the forefront, particularly when data from marginalized populations are involved. End users should be offered choices about privacy and ethics in the collection, storage, and use of data. Opt-out methods for data collected for model training and model application should be offered where possible.
Involve stakeholders and ‘non-experts’ in the selection, collection, and analysis of demographically representative qualitative data
December 22, 2023
Representatives of impacted stakeholders should be identified and partnered with on data collection methods. This is particularly important when identifying new or non-traditional data-gathering resources and methods. To increase representativeness and responsible interpretation, when collecting and analyzing specific datasets include diverse viewpoints and not only those of experts. Technology or datasets deemed non-problematic by one group may be predicted to be disastrous by others. Training data sets should be demographically representative of the cohorts or communities on whom the AI system will impact.
Operationalise inclusive and substantive community engagement
December 18, 2023
A vast body of knowledge about community engagement praxis exists. Guidelines and frameworks are updated and operationalised by practitioners from many disciplines including community cultural development, community arts, social work, social sciences, architecture, and public health. However, this vital element is largely neglected in the AI ecosystem although many AI projects would benefit from considered attention to community engagement. For instance, in the health sector, AI and advanced analytics implementation in primary care should be a collaborative effort that involves patients and communities from diverse social, cultural, and economic backgrounds in an intentional and meaningful manner. A Community Engagement Manager role could be introduced who would work with impacted communities throughout the AI-LC and for a fixed period post-deployment. Reciprocal and respectful relationships with impacted communities should be nurtured, and community expectations about both the engagement and the AI system should be defined and attended to. If impacted communities contain diverse language, ethnic, and cultural cohorts a Community Engagement Team from minority groups would be more appropriate. One role would be to develop tailored critical AI literacy programs for example. Organisations must put “the voices and experiences of those most marginalized at the centre” when implementing community engagement outcomes in an AI project.