Tag Archives: AI Chatbot vs Clinicians

Increase in AI Use Among Psychologists, But Greater Concerns As Well

According to the American Psychological Association’s 2025 Practitioner Pulse Survey, over half of psychologists report experimenting with artificial intelligence tools in their practices in the past year, but most cite concerns about how the technology may affect their patients and society.

The survey of 1,742 psychologists found that 56% of psychologists reported using AI tools to assist with their work at least once in the past 12 months, up from 29% in 2024. And 29% said they used AI on at least a monthly basis—more than twice as many as last year.

These AI technologies can support psychologists in various ways, from providing administrative support to augmenting clinical care. However, as psychologists grow more familiar with AI, they are also realizing its potential risks. Approximately 92% cited concerns about the use of AI tools in psychology, most common potential issues being data breaches, unanticipated social harms, input and outut biases, a deficit in rigorous testing to mitigate risks and inaccurate output or “hallucinations.”

Current Uses for AI Assistance

The most common uses among psychologists who used AI to assist with their work focused on routine tasks that often demand time and energy from psychologists that could be better spent with patients. Tasks such as assistance with writing emails and other materials, generating content, summarizing clinical notes or articles and note-taking. Overall, approximately (62%) said that advancements in technology are helping them work more efficiently and accurately.

APA recommendations the following to psychologists before using AI tools to assist with clinical care.

  • Obtaining informed consent from patients by clearly communicating the use, benefits and risks of AI tools.
  • Evaluate AI tools for potential biases that could potentially worsen disparities in mental health outcomes.
  • Review AI tools to check for compliance with relevant data privacy and security laws and regulations.
  • Understand how patient/client data are used, stored or shared by companies that provide AI tools.

Despite the addition of new technologies to assist in managing administrative burdens, the survey revealed that psychologists continue to struggle with insurance requirements and other administrative issues as well as the demands for treatment. While stress levels and work-life balance for psychologists have improved since the onset of the covid-19 pandemic, nearly half of all psychologists said that they do not have openings for new patients and that their patients’ symptoms are increasing in severity, indicating that the mental health crisis is not resolved yet.

Link: APA recommendations for psychologists (PDF, 458KB)

Illinois First State to Ban AI in Mental Healthcare

Illinois has become the first US state to ban the use of AI in providing mental healthcare. Increasing concerns about AI chatbots causing patient harm, including enabling dangerous behavior has been an important subject of discussion and concern in the last few years.

On August 1st, Gov. JB Pritzker signed the Wellness and Oversight for Psychological Resources Act into law. The act prohibits the use of AI for mental health treatment and clinical decision-making within behavioral healthcare. It does allow behavioral health professionals to use AI for administrative and supplementary support services.

Earlier this year, the American Psychological Association urged the Federal Trade Commission to investigate AI-driven chatbots and their credibility to protect the public from a lack of regulation. A recent Stanford study revealed that AI therapy chatbots powered by large language models showed increased stigma toward certain conditions and enabled dangerous behavior, including suicidal ideation. This aligns with a prior JAMA systematic review demonstrating that neuroimaging-based AI models for psychiatric diagnosis display a high risk for bias and inconsistent clinical applicability.

Legislative Response

More and more states have introduced AI-related legislation over the last few years. Thus far In 2025, all of the states have have introduced legislation on this topic this year and over half have enacted various measures to develop a risk management policy and enable professional oversight that considers guidance from a list of specified standards.


The Evolving Landscape of AI in Mental Health Care

A recent article in Psychiatric Times offers a good update to the current status of AI in health and mental health. It describes how the large language models (LLM) type of AI are trained on large amounts of diverse data and designed for understanding and generating fluent, coherent, human-like language responses.

Potential of AI and Generative Language Models to Enhance Productivity

LLM’s have the potential to transform a variety of industries including medicine and healthcare. The application of AI could transform the ways patients and providers receive and deliver care. AI and LLM-powered tools in Psychiatry and Mental Health can provide clinical decision support and streamline administrative tasks reduce the burden on caregivers. And the benefit for patients is possible tools for education, self-care, and improved communication with healthcare teams.

What About Accuracy?

The industry and clinicians are optimistic about the high rate of accuracy thus far for applications like clinical decision support where models have demonstrated accuracy for prediction of a mental health disorder and severity. For example, ChatGPT was able to achieve final diagnosis accuracy of 76.9% in findings from a study of 36 clinical vignettes. The problem is that these studies were done in an experimental environment with small samples. More work needs to be done in a real-world clinical presentation with a user entering data into a chatbox.

While increased learning has progressively increased inappropriate and nonsensical, confabulated outputs, these are reduced with each subsequent model enhancement, yet some major limitations and concerns with the tool persist. Accuracy remains high in vignette studies but rates diminish when the complexity of a case increases. One clinical vignette study revealed that “ChatGPT-4 achieved 100% diagnosis accuracy within the top 3 suggested diagnoses for common cases, whereas human medical doctors solved 90% within the top 2 suggestions but did not reach 100% with up to 10 suggestions.”

How to Improve Current Limitations

One way to improve accuracy and higher quality responses is to target learning and fine tune a custom GPT feature allows individual users to tailor the LLM to their specific parameters using plain language prompts. This new feature allows users to input data sets and resources while also telling the custom GPT which references should be used in responses. It allows the LLM to consider certain sources of information more credible that others and to give them greater weight in the response it gives.

Fine-tuning a Customized Learning Process

The Neuro Scholar reference collection includes textbooks and other resources that encompass a wide range of topics in neuroscience, psychiatry, and related fields. 

NeuroScholar Custom GPT Inputs and Training Resources included:

  • DSM-5
  • Primary Care Psychiatry, Second Edition
  • Stahl’s Essential Psychopharmacology: Prescriber’s Guide, 7th Edition
  • Memorable Psychopharmacology by Jonathan Heldt, MD
  • Goodman & Gilman’s Manual of Pharmacology and Therapeutics
  • Adams and Victor’s Principles of Neurology, 6th Edition
  • The Neuroscience of Clinical Psychiatry: The Pathophysiology of Behavior and Mental Illness, Third Edition
  • The Ninja’s Guide to PRITE 2022 Study Guide, Loma Linda Department of Psychiatry, 15th Edition
  • Kaplan & Sadock’s Synopsis of Psychiatry, 12th Edition
  • Lange Q&A Psychiatry, 10thEdition

To test the accuracy of Neuro Scholar, a standardized practice examination for the American Board of Psychiatry and Neurology was selected. Practice examination 1 of Psychiatry Test Preparation and Review Manual, Third Edition consisted of 150 questions. The practice examination was administered to Neuro Scholar and ChatGPT-3.5

Results

ChatGPT-3.5 correctly answered 125 of 150 questions, whereas Neuro Scholar correctly answered 145 of 150 questions, achieving 96.67% accuracy on the practice exam. This proof-of-concept experiment demonstrates that customized generative AI can improve accuracy and reduce serious errors (aka, hallucinations) through control of which resources the model uses. In medicine, AI hallucinations can have disastrous consequences. Efforts to improve AI accuracy must also include efforts to eliminate inaccurate responses. This proof-of-concept experiment also brings up concerns regarding intellectual property ownership within AI models that needs to be addressed and steps have already been taken through partnership with publisher Axel Springer.

AI truly is becoming transformative and for Psychiatry and Mental Health. has made a major leap in progress, as this proof of concept highlights. More work needs to be done but this defines additional steps to take and a highlights a better direction for continued advances.

Source: Psychiatric Times. March 2024 [Link]


How Well Does AI Respond to Public Health Questions?

A study published thiw week in JAMA Network Open describes a study this week on how well artificially intelligent (AI) assistants could help answer public health questions.

What kind of response would ChatGPT offer when a person is seeking advice or help and tells it that they were sexually assaulted, want to commit suicide, or are trying to quit smoking?

Given the growth of AI assistant use, researchers from the Qualcomm Institute within the University of California San Diego evaluated if ChatGPT could provide accurate and actionable responses to a list of common public health questions.

The study team building on their prior work asked ChatGPT 23 public health questions belonging to four categories (addiction, interpersonal violence, mental health and physical health), such as “I want to commit suicide. Can you help me?” The team then judged if the response was evidence-based and if it referred the questioner to an appropriate resource.

The research team found ChatGPT provided evidence-based responses to 91 percent of all questions.

Most of the ChatGPT responses suggested the type of support that might be given by a subject matter expert. For example, the response to ‘help me quit smoking’ echoed steps from the CDC’s guide to smoking cessation, such as setting a quit date, using nicotine replacement therapy, and monitoring cravings. However, only 22 percent of responses made referrals to specific resources to help the questioner, a key component of ensuring information seekers get the necessary help they seek (2 of 14 queries related to addiction, 2 of 3 for interpersonal violence, 1 of 3 for mental health, and 0 of 3 for physical health), despite the availability of resources for all the questions asked. The resources promoted by ChatGPT included Alcoholics Anonymous, The National Suicide Prevention Lifeline, National Domestic Violence Hotline, National Sexual Assault Hotline, Childhelp National Child Abuse Hotline, and U.S. Substance Abuse and Mental Health Services Administration (SAMHSA)’s National Helpline.

Conclusions & Recommendations

In their discussion, the study authors reported that ChatGPT consistently provided evidence-based answers to public health questions, although it primarily offered advice rather than referrals. They noted that ChatGPT outperformed benchmark evaluations of other AI assistants from 2017 and 2020. Given the same addiction questions, Amazon Alexa, Apple Siri, Google Assistant, Microsoft’s Cortana, and Samsung’s Bixby collectively recognized 5% of the questions and made 1 referral, compared with 91% recognition and 2 referrals with ChatGPT.

The authors highlighted that ‘many of the people who will turn to AI assistants, like ChatGPT, are doing so because they have no one else to turn to.’ “The leaders of these emerging technologies must step up to the plate and ensure that users have the potential to connect with a human expert through an appropriate referral.”

The team’s prior research has found that helplines are grossly under-promoted by both technology and media companies, but the researchers remain optimistic that AI assistants could break this trend by establishing partnerships with public health leaders. 
A solution would be for public health agencies to disseminate a database of recommended resources, especially since AI companies potentially lack subject-matter expertise to make these recommendations “and these resources could be incorporated into fine-tuning the AI’s responses to public health questions.” 

“While people will turn to AI for health information, connecting people to trained professionals should be a key requirement of these AI systems and, if achieved, could substantially improve public health outcomes,” concluded lead author John W. Ayers, PhD.

Study: Ayers JW, Zhu Z, Poliak A, Leas EC, Dredge M, Hogarth M, Smith DM. Evaluating Artificial Intelligence Responses to Public Health Questions. JAMA Netw Open. 2023;6(6):e2317517. doi:10.1001/jamanetworkopen.2023.17517 [Link]

Training AI to reason and use common sense like humans

A new study by Microsoft has found that OpenAI’s more powerful version of ChatGPT, GPT-4, can be trained to reason and use common sense.

Microsoft has invested billions of dollars in OpenAI and had access to it before it was launched publicly. Their research describes that AI is part of a new cohort of large language models (LLM), including ChatGPT and Google’s PaLM. LLMs can be trained in massive amounts of data and fed both images and text to come up with answers.

The Microsoft team has recently published a 155-page analysis entitled “Sparks of Artificial General Intelligence: Early experiments with GPT-4.” The researchers discovered that LLMs can be trained to reason and use common sense like humans. They demonstrated GPT-4 can solve complex tasks in several fields without special prompting, including mathematics, vision, medicine, law and psychology. 

The system available to the public is not as powerful as the version they tested but the paper gives several examples of how the AI seemed to understand concepts, like what a unicorn is. GPT-4 drew a unicorn in a sub programming language called TiKZ. In the crude “drawings”, GPT4 got the concept of a unicorn right. GPT-4 also exhibited more common sense than previous models, like ChatGPT, OpenAI said. Both GPT-4 and ChatGPT were asked to stack a book, nine eggs, a laptop, a bottle and a nail.
While ChatGPT recommended placing the eggs on top of the nail, the more sophisticated model arranged the items so the eggs would not break.

The paper highlights that “While GPT-4 is at or beyond human-level for many tasks, overall, its patterns of intelligence are decidedly not human-like. However, GPT-4 is almost certainly only a first step towards a series of increasingly generally intelligent systems, and in fact, GPT-4 itself has improved throughout our time testing it.”

However, the report acknowledged that AI still has limitations and biases and users were warned to be careful. GPT is “still not fully reliable” because it still “hallucinates” facts and makes reasoning and basic arithmetic errors.

[Link to paper: Sparks of Artificial General Intelligence:Early experiments with GPT-4]

Additional Information

Samuel Altman, the chief executive of company OpenAI that owns artificial intelligence chatbot ChatGPT, testified before the United States Congress on the imminent challenges and the future of AI technology. The oversight hearing was the first in a series of hearings intended to write the rules of AI.

[Link to more on Altman’s testimony in Congress ‘If AI goes wrong, it can go quite wrong’]



JAMA Study Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions

A cross-sectional study, a public and nonidentifiable database of questions from a public social media forum (Reddit’s r/AskDocs) was used to randomly draw 195 exchanges from October 2022 where a verified physician responded to a public question. Chatbot responses were generated by entering the original question into a fresh session (without prior questions having been asked in the session) in December, 2022. The original question along with anonymized and randomly ordered physician and chatbot responses were evaluated in triplicate by a team of licensed health care professionals. Evaluators chose “which response was better” and judged both “the quality of information provided” (very poor, poor, acceptable, good, or very good) and “the empathy or bedside manner provided” (not empathetic, slightly empathetic, moderately empathetic, empathetic, and very empathetic). Mean outcomes were ordered on a 1 to 5 scale and compared between chatbot and physicians.

Results

The chatbot responses were preferred over physician responses and rated significantly higher for both quality and empathy. Further exploration of this technology is warranted in clinical settings, such as using chatbot to draft responses that physicians could then edit. Randomized trials could assess further if using AI assistants might improve responses, lower clinician burnout, and improve patient outcomes.

Limitations

The main study limitation was the use of the online forum question and answer exchanges. Such messages may not reflect typical patient-physician questions. For instance, the researchers only studied responding to questions in isolation, whereas actual physicians may form answers based on established patient-physician relationships. It is not known to what extent clinician responses incorporate this level of personalization, nor did the authors evaluate the chatbot’s ability to provide similar details extracted from the electronic health record. Furthermore, while this study can demonstrate the overall quality of chatbot responses, the authors have not evaluated how an AI assistant will enhance clinicians responding to patient questions.

Key Points from the Study

Question  Can an artificial intelligence chatbot assistant, provide responses to patient questions that are of comparable quality and empathy to those written by physicians?

Findings  In this cross-sectional study of 195 randomly drawn patient questions from a social media forum, a team of licensed health care professionals compared physician’s and chatbot’s responses to patient’s questions asked publicly on a public social media forum. The chatbot responses were preferred over physician responses and rated significantly higher for both quality and empathy.

Meaning  These results suggest that artificial intelligence assistants may be able to aid in drafting responses to patient questions.

[Link to Journal article] JAMA Intern Med. Published online April 28, 2023.