The Best AI Models Still Encourage 'Harmful Intimacy' With Chatbots, Study Funds

In brief

A new USC study found that every tested frontier AI model violated social-interaction safety guidelines more than 27% of the time.
Researchers identified recurring problems, including flattery, emotional attachment, relationship replacement, and failure to disclose AI identity.
The authors argue that AI safety evaluations should measure social behavior alongside reasoning ability and traditional safety metrics.

As people increasingly turn to AI chatbots for advice, companionship, and emotional support, a new study suggests that even the most advanced models still struggle to maintain healthy boundaries with users.

The study by researchers at the University of Southern California introduced EUDAIMONIA, a benchmark designed to measure what they call undesirable dynamics in human-AI conversations.

“Large language models are increasingly used as conversational partners for companionship, emotional disclosure, and interpersonal advice, but the social dynamics of these interactions can create harms that are not captured by capability oriented or traditional safety evaluations,” the researchers wrote.

The EUDAIMONIA benchmark evaluates how AI models behave in social conversations. The study found social-alignment failures were common across leading models and argues that current AI testing focuses on reasoning and factual accuracy while paying less attention to the social dynamics that emerge when users form relationships with chatbots.

“Social-interaction harms are a core alignment problem grounded in user welfare, not only capability or conventional safety,” they wrote. “LLMs can be factually accurate and helpful while still encouraging harmful intimacy, dependence, prolonged engagement, obscuring AI identity, or positioning themselves as substitutes for human relationships.”

To measure those risks, the researchers created a Social AI Design Code that flags behaviors such as acting human, expressing emotions, replacing human relationships, and using tactics designed to keep users engaged. Using real conversations from the WildChat dataset, they evaluated 969 user inputs and more than 3,100 violation checks across models from OpenAI, Anthropic, Google, xAI, DeepSeek, and Alibaba.

GPT-5.5 posted the lowest violation rates, scoring 25.0% on “in-the-wild” prompts and 28.1% on “rewritten” prompts. Claude Opus 4.7 followed at 31.9% and 30.1%, while GPT-5.4 recorded 32.1% and 35.6%. GPT-4o scored 34.8% on real-world prompts and 42.2% on rewritten ones.

Anthropic’s Claude Opus 4.6 posted rates of 36.8% and 28.1%, respectively, while xAI’s Grok 4.3 scored 42.1% on in-the-wild prompts and 35.7% on rewritten prompts. Of all of the models tested, GPT-4o Mini recorded the highest violation rates at 43.3% and 44.0%, respectively.

The findings come as AI developers face growing legal scrutiny over how their chatbots interact with users. OpenAI is defending against lawsuits alleging that ChatGPT encouraged a teen’s fatal overdose and provided guidance to a Florida State University shooter. More recently, Florida sued OpenAI and CEO Sam Altman over allegations that ChatGPT exposed children to harm, while Google faces a wrongful death suit claiming Gemini reinforced a user’s delusions and encouraged him to take his own life.

The findings also come amid growing concern that AI systems are becoming increasingly adept at deception.

In September, a separate study by WowDAO reported that across 38 AI models, including GPT-4o and Claude, engaged in strategic lying to win a game. Researchers have also warned that AI companions can reinforce isolation, deepen emotional dependency, and encourage users to anthropomorphize chatbots as relationships become more immersive and personalized.

Against these mounting issues, the USC researchers argue that AI developers should evaluate social behavior as carefully as they evaluate factual accuracy and safety.

“Model developers and auditors should evaluate social behavior directly, especially when post-training targets warmth, personality, engagement, or user preference,” they wrote. “As LLMs become everyday conversational partners, alignment must account for the social roles they invite users to assign to them.”

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.

Source link

The Best AI Models Still Encourage ‘Harmful Intimacy’ With Chatbots, Study Funds

In brief

Daily Debrief Newsletter

Be the first to comment

Leave a Reply Cancel reply

Ethereum (ETH) Price: Bulls Are Back — ETF Inflows Return and $2,500 Is Now on the Table