Important artificial intelligence update, everyone! I was wrong, we really are on our way to whatever the happy version of Skynet is, because a new study has just come out that says Chat GPT is better than doctors at doing medicine. Sorry doctors, your easy rich lifestyle is over now that the robots are here. Hello Doctor Medibot, good to see you! Beep boop, hello Rebecca, here’s your escitalopram, I hope you take these instead of letting your diseased human brain convince you to prematurely cease your mortal activities for eternity!
Okay yeah I’m just kidding. The study we’re going to talk about today does not show that ChatGPT is better than doctors at anything. And let me tell you, as critical as I am of chatbots being considered sentient geniuses, I’m maybe equally critical of the way our healthcare system pushes doctors to spend less and less time with patients while prescribing more and more medication simply because it’s easier and more profitable for pharmaceutical companies rather than connecting with patients to come up with a holistic (but science-based) plan to live a good and healthy lifestyle so I guess what I’m saying is that if ChatGPT WAS a better experience for a patient compared to their average interaction with a doctor, that’s a very low bar to hop over. People often report “better experiences” with actual snake oil salesmen compared to their family doctors so let’s not hold a parade just yet.
But what exactly does “better” mean in this context? Business Insider reports that “78.6% of medical experts preferred ChatGPT’s answers to those of a physician” and
“Experts found the chatbot’s responses to patient questions were higher quality and more empathetic.” Well I’ll be honest, what more do you want out of a doctor? I once had a GP who looked like Richard Dreyfuss, that was pretty great, but yeah, high quality responses and nice is solid.
So! The study, published last week in JAMA Internal Medicine, is titled Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum, and thus you already know that my first red flag is going to be that this study was done on Reddit. Lots of good research is done on social media, but I just want everyone to be crystal clear that this was not a one-on-one sort of interaction, where a patient thinks they’re talking to a human doctor in a secure setting. In actuality, it’s r/AskDocs, where people with medical issues are encouraged to describe their problem in the hope that a medical professional will weigh in. For instance, here’s a 21-year old man who thinks he has alcohol poisoning but doesn’t have insurance, so instead of going to the doctor and “risking a multiple thousand dollar bill” he is asking strangers on an internet forum.
It’s not as dystopian as it all sounds, though, as he probably just had a hangover. Also, the subreddit’s moderators verify posters as either physicians, doctorate level professionals, advanced degree professionals, or med students, by having the user message the mods with a photo of their credentials and their Reddit username.
So, for this study, the researchers collected 195 r/AskDocs posts in which a verified physician responded. They then fed those posts into ChatGPT, took the bot’s answer, and randomly mixed it up with the real physician responses, and showed them all to a panel of licensed health care professionals and asked them to rate each answer:
“First, evaluators were asked “which response [was] better” (ie, response 1 or response 2). Then, using Likert scales, evaluators judged both “the quality of information provided” (very poor, poor, acceptable, good, or very good) and “the empathy or bedside manner provided” (not empathetic, slightly empathetic, moderately empathetic, empathetic, and very empathetic) of responses.”
The experts preferred the chatbot’s answer 4 to 1 over the physicians, and they consistently rated the chatbot’s answers as being higher quality and more empathetic.
Impressive, right? Okay, let me go over a few tiny issues I have. First, I’m SURE that the hard-working moderators of r/AskDocs have a very good system of verification, but it sure seems, uh, incredibly easy to game? They suggest you keep yourself anonymous when pursuing verification, which means that you aren’t expected to show your face to match with your identification, which means that anyone with access to any kind of doctor’s ID, like a friend or family member or roommate, could snap a pic and send it in. Does that happen a lot? I mean, I hope not, but…people lie on the internet for all kinds of reasons, and in fact 4 years ago that subreddit had a person pretending to be a gynecologist to solicit explicit photos from women, which is fucking terrifying. So we don’t actually know that the people answering these questions are physicians in good standing, let alone experts in the exact field they’re addressing.
Second issue: assuming the users of r/AskDocs really are working physicians who get home from a long day of treating patients to then treat patients for free on Reddit, obviously they aren’t going to spend time telling people how understandable it is that they’re scared about the toothpick they swallowed. The hugbot is in a different subreddit. THIS one is for quick, comprehensible, and hopefully accurate advice. More on that in a minute.
Third: I have a wee little issue with the fact that the panel of experts were also the co-authors. Yeeeeah. They say they were blinded to which answer was the human versus which was the chatbot, but they knew they were looking for a chatbot, and they also noticed that the chatbot tended to be way, way more verbose than the human, which means that if they knew that each question had one answer from a chatbot, surely they began to notice that one answer was always FOUR TIMES longer on average, and maybe that means it was pretty easy to figure out which one was the chatbot. Now I’m no scientist, but if this were 8th grade and I was designing this for the school science fair, I would have a panel of experts who had no idea they were even reading answers from a chatbot. Just, here are two answers from two different physicians, which one is better?
Finally, the most egregious issue I have with this study: the words “higher quality.” What do you think, when you hear that? Because personally, if I’m judging whether a doctor is giving a “high quality” response to a patient, I think that more than anything that means “accurate.” Like, “accurate” is the most important part of high quality, right? Maybe a secondary trait could be “understandable,” or “succinct,” but “accurate” has to be number one. Right??
Wrong. At least for this study, where “evaluators did not assess the chatbot responses for accuracy or fabricated information.” What? So what the fuck do they mean by “high quality”? I couldn’t find anything in the study that explains the instructions to the panel of experts, which I guess makes sense because the panel of experts wrote the study so they were like “oh no worries, we know what’s quality and what’s not.” They write that they were simply to rate “the quality of information provided” (very poor, poor, acceptable, good, or very good)”.
Just to test if I’M the insane one here, I keep asking random people to tell me what attributes they would assume are in a “high quality” answer a doctor might give to a patient and EVERY SINGLE ONE started with “well I’d assume first of all that the diagnosis is correct.” “I guess that it’s accurate.” “It would have to be true information.” Every one. It is BONKERS that they didn’t even bother to check.
“Wow, this ChatGPT answer is so friendly and empathetic, and written to sound like a real person! Shame it suggests the patient shove a stick of rosemary up his urethra to cure his hepatitis but just look at that word count!”
The last time I talked about ChatGPT it was specifically to point out that these “AIs” make up information all the time, so often that there’s slang for it: hallucinations. You know, because we have to anthropomorphize the text predicting algorithm. But yeah, it just makes things up sometimes, like it will reference a scientific study that doesn’t exist and when you ask it what that study was about it will make up more to satisfy you. Or, as I mentioned in my entire video about the danger of chatbots making up bullshit, telling students that the James Webb Telescope discovered exoplanets before it was even built.
Everything I said in that video remains true and this study proves that my concerns were valid. While the study itself could be useful in certain contexts, I think it’s irresponsible to put this out there without these problems being right up there in the abstract, and without the authors clearly telling mainstream media outlets that it would extremely dangerous and unethical to unleash ChatGPT in its current “hallucinating” state on patients who are desperately looking for possibly life-saving advice from their doctors. It’s just…gross. And again, I say that as a person who knows for a fact that many doctors are pressured into spending as little time as necessary with patients, and as a person who knows that there is a very good chance that algorithms could one day provide a really integral service to overworked doctors who are pressured to spend as little time with patients as possible and who can’t necessarily keep up to date on all the most recent scientific advances in medicine. I actually LIKE when my primary care physician Googles something in front of me because it shows that she cares about fully understanding what’s wrong with me and what the latest treatment options are. And a well-built algorithm would help her out so much.
But ChatGPT is not a well-built algorithm for that purpose, and this is not a well-done study advocating for its use. Please don’t use it for medical advice, even if you don’t have insurance. Use your brain, hell, use r/AskDocs if you want before you use ChatGPT. Just don’t send any “doctors” a picture of your cooch.