← Back to The IndexFinal ReviewGroup 1
Final Review

Review Nº 01

Mapping AGI-Specific Workplace Risks
AuthorsAdd author names
28/30
Score
A timely, clearly-structured attempt to separate AGI-specific workplace risk from narrow-AI risk, with a reasonable O*NET scoring pipeline — undercut by a speculative AGI definition, a small evaluation, single-LLM thematic analysis, and no reported inter-rater reliability.
AI Risks · Final Review

The Pros

+
Addresses a clear gap: most workplace-AI risk work targets narrow AI, and the paper deliberately isolates AGI-specific scenarios via the AGI-vs-AII rank difference — a sensible way to avoid regenerating known narrow-AI risks.
+
The three research questions are explicit and the pipeline (define AGI → label tasks A0–A4 → compute Automation/Augmentation scores → generate and evaluate scenarios) is logically organized and well visualized in Figure 1.
+
Methodologically reuses established anchors (Eloundou rubric, O*NET, HHI-based augmentation score) rather than inventing ad hoc metrics.
+
The Plurals multi-agent setup is a reasonable, well-motivated mechanism to counter single-LLM mode collapse and inject ideological diversity into brainstorming.
+
The observation that the LLM under-assigns A4 because of algorithm aversion / oversight preference is an insightful, honestly-reported result worth exploring further.
+
The three identified drivers (autonomy, flawed human–AGI interaction, AGI–AGI collaboration) are coherent, and the AGI–AGI point is a genuinely novel, well-argued risk amplifier.

The Cons

The entire analysis is conditioned on a single synthesized AGI definition; the authors acknowledge this, but it makes results definition-dependent and not robust to alternative AGI conceptions.
Evaluation is small but reasonable — 100 scenarios over 20 occupations — yet the quality claims are stated confidently.
No inter-rater reliability (e.g. Cohen's / Fleiss' κ) is reported for the four researchers despite a subjective 8-criterion rubric; only means and SDs appear.
The thematic analysis behind the headline “three drivers” is run by a single LLM (Gemini 3.5 Flash), so the central qualitative result is itself unvalidated.
The A3 clustering may be an artifact of the labeling prompt rather than reality.
Presentation issues: inconsistent figure numbering/references, the date header “Turin', June, 2026”, and typos (“appears to me mostly”); the conclusion that AGI risks are “not necessarily more severe” sits oddly against the paper's own scale/speed amplification argument.
Back to The Index
Final Review · Group 1The IndexAI Risks · 2026