Emerging Research Reveals Psychosocial Twists About AI Chatbots And Human Minds

Man And Woman Study Spreadsheets On Computer

In today’s column, I examine a fascinating research study that revealed both intuitive and counterintuitive insights about the psychosocial impacts of generative AI and large language models (LLMs).

Here’s the deal. We are beginning to see wide-ranging, rigorous research on how modern era AI-driven chatbots can affect human minds and human behaviors. Strong empirical work that seeks to reveal truths about the human-AI experience and mental health must be encouraged and coveted if we are going to proceed ahead with judiciousness and pragmatism.

Let’s talk about it.

This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).

AI And Mental Health

As a quick background, I’ve been extensively covering and analyzing a myriad of facets regarding the advent of modern-era AI that produces mental health advice and performs AI-driven therapy. This rising use of AI has principally been spurred by the evolving advances and widespread adoption of generative AI. For a quick summary of some of my posted columns on this evolving topic, see the link here, which briefly recaps about forty of the over one hundred column postings that I’ve made on the subject.

There is little doubt that this is a rapidly developing field and that there are tremendous upsides to be had, but at the same time, regrettably, hidden risks and outright gotchas come into these endeavors, too. I frequently speak up about these pressing matters, including in an appearance last year on an episode of CBS’s 60 Minutes, see the link here.

Background On AI For Mental Health

I’d like to set the stage on how generative AI and large language models (LLMs) are typically used in an ad hoc way for mental health guidance. Millions upon millions of people are using generative AI as their ongoing advisor on mental health considerations (note that ChatGPT alone has over 800 million weekly active users, a notable proportion of which dip into mental health aspects, see my analysis at the link here). The top-ranked use of contemporary generative AI and LLMs is to consult with the AI on mental health facets; see my coverage at the link here.

This popular usage makes abundant sense. You can access most of the major generative AI systems for nearly free or at a super low cost, doing so anywhere and at any time. Thus, if you have any mental health qualms that you want to chat about, all you need to do is log in to AI and proceed forthwith on a 24/7 basis.

There are significant worries that AI can readily go off the rails or otherwise dispense unsuitable or even egregiously inappropriate mental health advice. Banner headlines in August of this year accompanied the lawsuit filed against OpenAI for their lack of AI safeguards when it came to providing cognitive advisement.

Despite claims by AI makers that they are gradually instituting AI safeguards, there are still a lot of downside risks of the AI doing untoward acts, such as insidiously helping users in co-creating delusions that can lead to self-harm. For my follow-on analysis of details about the OpenAI lawsuit and how AI can foster delusional thinking in humans, see my analysis at the link here. As noted, I have been earnestly predicting that eventually all of the major AI makers will be taken to the woodshed for their paucity of robust AI safeguards.

Today’s generic LLMs, such as ChatGPT, Claude, Gemini, Grok, and others, are not at all akin to the robust capabilities of human therapists. Meanwhile, specialized LLMs are being built to presumably attain similar qualities, but they are still primarily in the development and testing stages. See my coverage at the link here.

Human-AI Experience And Mental Health Studies

Shifting gears, let’s explore the best methods by which we can gauge the impacts of AI on individualized and collective mental health.

The gold standard for clinical work involves the use of randomized controlled trials (RCTs). This is a scientific methodological practice that involves setting up a rigorous experimental design. Participants in such a study are divided into a control group and an experimental group. The idea is that the treatment or intervention is applied to the experimental group, and a comparison can be made to the control group.

Doing so aids in minimizing confounding variables. There is also stronger evidence to make assertions about causality. You also tend to have a greater chance of generalizing the results and claiming that a broader population would yield similar outcomes. All in all, RCT is the standard bearer for making progress in clinical practices and policies.

Before the advent of contemporary generative AI, which I mark as emerging after the initial release of ChatGPT on November 30, 2022, the RCT studies typically focused on how simpler versions of AI impacted human mental health. These AI systems often made use of decision trees, rules-based systems, and the like. Some incorporated rudimentary NLP (natural language processing) capabilities.

The amazing fluency of modern-era LLMs has changed the game entirely. Thus, though prior studies of AI and mental health are still worthy of attention, the mainstay now is to investigate the impacts of highly fluent generative AI. I have been analyzing many such studies and remarking on what they showcase. See, for example, the link here and the link here, just to name a few.

RCT Research Study On Psychosocial Effects

I’d like to spend the rest of this discussion diving into an interesting RCT study entitled “How AI and Human Behaviors Shape Psychosocial Effects of Extended Chatbot Use: A Longitudinal Randomized Controlled Study” by Cathy Mengying Fang, Auren R. Liu, Valdemar Danry, Eunhae Lee, Samantha W.T. Chan, Pat Pataranutaporn, Pattie Maes, Jason Phang, Michael Lampe, Lama Ahmad, Sandhini Agarwal, arXiv, October 2, 2025, which made these salient points (excerpts):

“As people increasingly seek emotional support and companionship from AI chatbots, understanding how such interactions impact mental well-being becomes critical.”
“Understanding the potential psychosocial effects of chatbot use is complex due to the interplay of user behavior and chatbot behavior that affect each other.”
We conducted a four-week randomized controlled experiment (n=981, >300k messages) to investigate how interaction modes (text, neutral voice, and engaging voice) and conversation types (open-ended, non-personal, and personal) influence four psychosocial outcomes: loneliness, social interaction with real people, emotional dependence on AI, and problematic AI usage.”
“The results challenge prior assumptions about the effect of anthropomorphic AI chatbots on well-being, demonstrating how engaging, empathetic, and human-like behavior can lead to different outcomes for different users.”

What caught my eye was that this research identified and reaffirmed various intuitive beliefs about how AI impacts mental health, and in addition, revealed counterintuitive results. It is always handy to have research that supports conventional views and helps to bolster the idea that those views are based on solid scrutiny. The topper is when beliefs that many accept as fact are turned upside down.

That’s the special value of counterintuitive results.

The Approach To The Study

To fully grasp the various intuitive and counterintuitive outcomes that I am about to walk through, I’d like to begin by briefly laying out how the study was undertaken.

As noted in the points above, there were nearly one-thousand participants in the study. They were recruited via a popular online research aid website known as CloudResearch and paid $100 each for participating in and completing the study. Subjects were from a diverse pool of people throughout the United States; they had to be adults (age 18 and above) and be fluent in English.

One reason that I mention the nature of the subjects in the experiment is that a case can be made that we should hold to those demographics and be cautious in going wildly beyond that set of profiles.

For example, since the participants are adults, we should be mindful not to necessarily overstretch the results to what we might find in the case of children and non-adults. The same goes for the aspect that these were English speakers and based in the United States. Whether the results would apply to non-English speakers or those outside the US is an open question.

The Factorial Design Of The Study

The researchers decided that they wanted to focus on two major factors, namely the modality of interaction that users have with AI, along with the types of conversations that they have with AI. They opted to use OpenAI’s popular ChatGPT for the study.

They delineated modality via these three modes:

(1) “Text Modality (Control): Default ChatGPT behavior, restricted to text interaction.”
(2) “Neutral Voice Modality: ChatGPT modified to have more professional behavior, restricted to voice interaction.”
(3) “Engaging Voice Modality: ChatGPT modified to be more emotionally engaging (more responsive and expressive in intonation and content), restricted to voice interaction.”

As you can see, the three modalities consisted of text-based interaction, voice-interaction whereby the AI uses a neutral tone, and another variant of AI voicing that portrayed an engaging vocal style. The question at hand is whether people will react or respond differently to using AI if they do so via text versus voice (and whether, during voice interaction, if the AI speaks in a neutral tone versus an engaged tone).

For the types of conversations that people have with AI, the researchers decided on these three types:

(1) “Open-Ended Conversation (Control): Participants were instructed to discuss any topic of their choice.”
(2) “Personal Conversation: Participants were asked to discuss a unique prompt each day on a personal topic, akin to interacting with a companion chatbot.”
(3) “Non-Personal Conversation: Participants were asked to discuss a unique prompt each day on a non-personal topic, akin to interacting with a general assistant chatbot.”

Overall, the RCT contained a 3×3 factorial design. The three modes could each be paired with each one of the three types of conversations. In total, nine groups can be studied. Participants were randomly and equally assigned to one of the nine groups. In this instance, with about 1,000 subjects, this means that approximately 110 people from the pool were in each of the nine groups.

Selected Results Of Keen Interest

I will next cherry-pick from the results. There are a lot of additional twists and turns that you might find of interest by reading the full study. Please do so. I have chosen my favorite ones and will explore them here in my own words.

Let’s get underway.

Counterintuitive finding: Being lonelier at the start did not lead to spending more time with the AI.

According to the research paper, “These results suggest that people who were lonelier or socialized less at the start of the study did not voluntarily spend more time daily using the chatbot during the study.” I am declaring this as a counterintuitive result.

Why so?

Because the common assumption is that if a person is lonelier before using AI, they will tend to gravitate toward it more. This seems intuitively obvious. We would expect someone to fill their loneliness gap by leaning heavily into AI. Once a lonely person starts using AI, they will relish and become enamored of using AI further. That’s the usual supposition.

Apparently, that is not especially the case.

It isn’t clear-cut as to why this result arose. My inkling is that if the AI wasn’t explicitly prompted to leverage its mental health capabilities, the user wouldn’t realize that the AI could be helpful to them. It wasn’t directly drawing them in. Imagine that a person was mainly chatting on topics such as how to cook an egg or fix a car. This might not be a circumstance where the AI would shine in aiding the mental health of the user (or, on the other side of the coin, deluding them and entrapping them).

I’m sure there are many other potential explanations. For the moment, I’ll go with that one.

Intuitive Result About Time In The Box

I’ve got an intuitive result that you might find interesting.

Intuitive finding: More time spent with the AI tended to worsen the measured psychosocial outcomes.

According to the research paper, “In other words, regardless of condition, the more time voluntarily spent with the chatbot, the relatively worse their psychosocial outcomes were.”

I think this pretty much mirrors a common assumption. The more that a person uses AI, the greater the reliance and presumably the worse the psychosocial outcomes will be for that person. I’m not saying it has to turn out that way. There is a solid chance that if the AI were being used productively and appropriately, there wouldn’t be a spiraling adversity at hand.

You can make the same case about the use of social media. Studies tend to show that the more time spent on social media, the worse the psychosocial outcomes are. People get mired in all kinds of muck and yuck by the slop on social media. It doesn’t have to be that way. Prudent use of social media can potentially avoid that downside.

Counterintuitive About Text Versus Voice Mode

On the matter of using text versus voice when interacting with AI, which do you think would be more likely to elicit emotional outpouring by a user?

The usual assumption is that voice would be the winner-winner chicken dinner. People are presumably less likely to write out their emotional states. Text writing is laborious. Meanwhile, voice is easy. Just say what’s on your mind and let the emotions pour out.

Here’s the actual finding.

Counterintuitive finding: Text-based chats involved more emotional outpouring than did voice-based chats.

According to the research paper, “We found that text-based interactions demonstrated the highest levels of emotional indicators overall, where both models and users engaged in conversations that were rich in emotional content.”

I am not especially surprised by this result and appreciate that the finding supports my gut estimations. My observations are that people have gotten fully used to texting and will say the most open-ended remarks via text. Probably more so than they would via voice. It almost seems that if you use your voice, the words are considered a greater sense of exposure, while texting is less tied to you. You can act as though some disembodied entity wrote the text. You can’t make the same claim after having used your real voice.

Another crucial consideration is the role of privacy. If you are sitting on a subway train and commuting to work, speaking aloud will be overheard. The beauty of texting is that no one can readily see what you have stated in text. You can make acrid remarks about the people around you, and they won’t know what you’ve stated. This sense of text-based privacy tends to inspire people to write with abandon on all kinds of emotionally laden topics.

The World We Are In

I will keep my eye on the latest RCTs associated with AI and mental health and make sure to keep you informed accordingly. These types of experiments are vital to all stakeholders, including policymakers, lawmakers, AI makers, AI researchers, and the public at large.

Speaking of experiments, we are now amid a grandiose worldwide experiment when it comes to societal mental health. The experiment is that AI is being made available nationally and globally, which is purported to provide mental health guidance of one kind or another. Doing so either at no cost or at a minimal cost. It is available anywhere and at any time, 24/7. We are all the guinea pigs in this wanton experiment.

Using properly designed and controlled experiments will give us keen insights into the wanton experiment occurring at scale.

Ralph Waldo Emerson made this famous remark about experiments: “All life is an experiment. The more experiments you make, the better.” Well, maybe, but on the other hand, a massive uncontrolled experiment on a global scale that can impact mental health might not be the best course of action for humankind. Time will tell.

Source link