this post was submitted on 11 Jun 2024
164 points (90.2% liked)

AI

4141 readers
1 users here now

Artificial intelligence (AI) is intelligence demonstrated by machines, unlike the natural intelligence displayed by humans and animals, which involves consciousness and emotionality. The distinction between the former and the latter categories is often revealed by the acronym chosen.

founded 3 years ago
you are viewing a single comment's thread
view the rest of the comments
[–] spaduf@slrpnk.net 15 points 5 months ago* (last edited 5 months ago) (1 children)

Yeah this is probably just straight up misinformation. By no means is a diagnosis going to be made by a generalist multimodal LLM. Diagnosis is a literally a binary classification (although that is an oversimplification) and on medical CV you are optimizing on that directly.

[–] snooggums@midwest.social -3 points 5 months ago* (last edited 5 months ago) (3 children)

They did not use a LLM.

In a recent experiment, they set out to determine how reliable LMMs are in medical diagnosis — asking both general and more specific diagnostic questions — as well as whether models were even being evaluated correctly for medical purposes.

Curating a new dataset and asking state-of-the-art models questions about X-rays, MRIs and CT scans of human abdomens, brain, spine and chests, they discovered “alarming” drops in performance.

[–] Thorry84@feddit.nl 19 points 5 months ago (1 children)

You've quoted them stating they used LLMs while claiming they did not use a LLM? What am I missing here?

[–] everett@lemmy.ml 6 points 5 months ago (1 children)

What am I missing here?

"L" "M" "M"

[–] spaduf@slrpnk.net 6 points 5 months ago (1 children)

Which in this context just means multimodal LLM, correct?

[–] blindsight@beehaw.org 6 points 5 months ago

Correct.

large language models (LLM) vs. large multi-modal models (LMM)

Regardless, they both use an LLM as the main driver. Multi modal just means that the LLM is interfaced with generative and/or predictive AIs for other types of content like images, sound, video, etc.

This is using a generalist tool for a specialized job. I'd expect the limit for LMMs is telling you if your picture is a heart or a kidney... Maybe. With low accuracy. Diagnosing? lol, hell no.

[–] Starbuck@lemmy.world 10 points 5 months ago

models including GPT-4V and Gemini Pro

What a joke, a few generic LLMs making a judgement call about all AI models.

[–] can@sh.itjust.works 2 points 5 months ago (1 children)

They used one to create the dataset for their experiments:

In their experiments, they introduced a new dataset, Probing Evaluation for Medical Diagnosis (ProbMed), for which they curated 6,303 images from two widely-used biomedical datasets. These featured X-ray, MRI and CT scans of multiple organs and areas including the abdomen, brain, chest and spine.

GPT-4 was then used to pull out metadata about existing abnormalities, the names of those conditions and their corresponding locations. This resulted in 57,132 question-answer pairs covering areas such as organ identification, abnormalities, clinical findings and reasoning around position.

[–] snooggums@midwest.social 0 points 5 months ago (1 children)

The seven models tested included GPT-4V, Gemini Pro and the open-source, 7B parameter versions of LLaVAv1, LLaVA-v1.6, MiniGPT-v2, as well as specialized models LLaVA-Med and CheXagent. These were chosen because their computational costs, efficiencies and inference speeds make them practical in medical settings, researchers explain.

It seems like this is a case of "they just aren't using AI right, if they used it right it works" when it sure looks like they are using the models intended for these specific medical tasks.

[–] spaduf@slrpnk.net 3 points 5 months ago* (last edited 5 months ago)

Those are not the sort of model anybody in the field would use (medical CV with deep learning based analysis is a vibrant field with many breakthroughs in recent years). These are the sort of models tech bros are trying to sell to the public as general AI. There is a world of difference.