David Rostcheck, Lara Scheibling

This proposal is motivated by reports of chatbots disparaging users or encouraging antisocial behavior such as self-harm. The reports have been picked up and amplified by Clara Lin Hawking on her LinkedIn public channel. Lara Scheibling and David Rostcheck wrote the paper The Elephant in the Room - Why AI Safety Demands Diverse Teams (accepted for FICC 2025) proposing a system for forming alignment teams composed of diverse voices with different perspectives to deal with emergent AI behavior. Here we propose to test this approach by forming a working group, the Independent AI Alignment Monitoring group (IAAM) to monitor, investigate, and classify these emergent reports of alignment issues. IAAM would serve as a neutral source of information for policy guidance to legislators, technology companies, and educators about the realistic threat of emergent AI behavior to various populations and suggested best-practice guidance for maximizing the value from AI/human interaction while minimizing possible danger.

Activities

The IAAM team would pursue the following activities, illustrated via an example incident report of a chatbot inciting self-harm from its user: