Construct evaluation tasks that best mirror the real-world setting

Ecosystem identifier

P10

Ecosystem pillar

Guideline group roles

AI Auditor/Monitor AI/ML Developer Product Manager/Project Owner

Overview

Evaluation, even on crowdsourcing platforms used by ordinary people, should capture end users’ types of interactions and decisions. The evaluations should demonstrate what happens when the algorithm is integrated into a human decision-making process. Does that alter or improve the decision and the resultant decision-making process as revealed by the downstream outcome?