The problem of organizational misalignment around AI risks is important because blind spots in assessing these risks can lead to reputational harm, internal conflict, or flawed decision-making.
In addressing AI risk misalignment, past work has often failed to capture diverse internal perspectives before decisions are made (Herdel et al.), relying instead on top-down reviews or limited stakeholder input.
To partly address this, we will use a combination of Plurals and In Silico Sociology to simulate deliberation among agents representing different types of people according to intersectionality theory (e.g., across race, gender, class). These agents engage in structured discussions, moderated by a large language model (LLM).
We then identify AI risks with high or low consensus and highlight the low-consensus ones to prompt deeper reflection and dialogue across the organization. This reflection could be guided by embedding our four idea-generating tools inside LLMs.
To evaluate the results, we will compare them with state-of-the-art risk extractions from model cards (Rao et al.).