If AI is making the Turing test obsolete, what might be better? : technology

[-] ExLisper@linux.community 35 points 9 months ago

It's not making Turing test obsolete. It was obvious from day 1 that Turing test is not an intelligence test. You could simply create a sufficiently big dictionary of "if human says X respond with Y" and it would fool any person that its talking with a human with 0 intelligence behind it. Turing test was always about checking how good a program is at chatting. If you want to test something else you have to come up with other test. If you want to test chat bots you will still use Turing test.

[-] intensely_human@lemm.ee 12 points 9 months ago

Sounds to me like that sufficiently large dictionary would be intelligent. Like, a dictionary that can produce the correct response to every thing said sounds like a system that can produce the correct response to any thing said. Like, that system could advise you on your career or invent machines or whatever.

[-] JohnEdwa@sopuli.xyz 17 points 9 months ago* (last edited 9 months ago)

So would a book could be considered intelligent if it was large enough to contain the answer to any possible question? Or maybe the search tool that simply matches your input to the output the book provides, would that be intelligence?

To me, something can't be considered intelligent if it lacks the ability to learn.

load more comments (2 replies)

[-] ExLisper@linux.community 14 points 9 months ago

No, a dictionary is not intelligent. A dictionary simply matches one text to another. A HashMap is not intelligent. But it can fool a human that it is.

[-] lauha@lemmy.one 13 points 9 months ago

Yes, but you could argue that human brain is a large pattern matcher with a dictionary. What separates human intelligence from machine intelligence?

[-] ExLisper@linux.community 6 points 9 months ago

The question is not if something is a patter matcher or not. The question is how this matching is done. There are ways we consider intelligent and ways that are not. Human brain is generally considered intelligent, some algorithms using heuristics or machine learning would be considered artificial intelligence, a hash map matching string A to string B is not in any way intelligent. But all this methods can produce the same results so it's impossible to determine if something is intelligent or not without looking inside the black box.

[-] lauha@lemmy.one 1 points 9 months ago

Yes, but we have no strict or clear s ientific definition of what makes humans intelligent or what intelligence even is.

Humans are intelligent and machines are not "just because"

[-] ExLisper@linux.community 4 points 9 months ago

Yes, we don't have a universal definition of intelligence but we in general everyone would agree that knowledge is not intelligence. Simply storing information does not make anything intelligent. Book is not intelligent, Wikipedia is not intelligent, hash map is not intelligent.

[-] lauha@lemmy.one 1 points 9 months ago

Yes, but we also have to draw a line somewhere. You could just as well turn any non-random based computer program into a huge hashtable, yet the intelligence arises from somewhere. There is no magic to human intelligence, unless you start believing in the soul or something.

[-] ExLisper@linux.community 1 points 9 months ago

Yes, that's the whole point. You can turn substitute computer program by a hash map and the results would be the same but everyone in general agree that a hash map is not intelligent. Defining exactly why it's not intelligent is tricky though. It comes down to some very basic concepts that we understand intuitively but are very hard precisely define like what it means to 'know' something or to 'understand' something. One famous example is a very good dictionary: let's say some guy has a very good Chinese dictionary. A Chinese speaking person can write question down and give it to this guy. He will look up every symbol in the question, translate it to English, respond and translate the response back to Chinese using the same dictionary. Does he 'speak' Chinese? He can communicate in Chinese but obviously he does not speak it. Does he 'understand' Chinese? Again, not really, he can just look up symbols in a dictionary. Specifying the exact reason why we would not say that he can 'speak' Chinese is difficult thought. It's the same with intelligence. We intuitively understand why a book is not intelligent but to say exactly why is tricky.

[-] lauha@lemmy.one 1 points 9 months ago

Yes but you are missing my point. We have no way of measuring if a human is intelligent. The whole intelligence might just as well be an illusion.

load more comments (2 replies)

[-] GlitchyDigiBun@lemmy.dbzer0.com 5 points 9 months ago

Yet language and abstraction are the core of intelligence. You cannot have intelligence without 2 way communication, and if anything, your brain contains exactly that dictionary you describe. Ask any verbal autistic person, and 90% of their conversations are scripted to a fault. However, there's another component to intelligence that the Turing Test just scrapes against. I'm not philosophical enough to identify it, but it seems like the turing test is looking for lightning by listening for rumbling that might mean thunder.

[-] ExLisper@linux.community 7 points 9 months ago

If you want to get philosophical the truth it we don't know what intelligence is and there's no way to identify it in a black box. We may say that something behaves intelligently or not but we will never be able say if it's really intelligent. Turing test check if a program is able to chat intelligently. We can come up with a test for solving math intelligently or driving car intelligently but we will never have a test for what most people understand as intelligence.

[-] 0ops@lemm.ee 2 points 9 months ago

This is what it comes down to. Until we agree on a testable definition of "intelligence" (or sentience, sapience, consciousness or just about any descriptor of human thought), it's not really science. Even in nature, what we might consider intelligence manifests in different organisms in different ways.

We could assume that when people say intelligence they mean human-like intelligence. That might be narrow enough to test, but you'd probably still end up failing some humans and passing some trained models

[-] ExLisper@linux.community 4 points 9 months ago

It's not that it's not science. Different sciences simply define intelligence in different ways. In psychology it's mostly the ability to solve problems by reasoning so 'human like' intelligence. They don't care that computers can solve the same problems without reasoning (by brute force for example) because they don't study computers. In computer science it's more fuzzy but pretty much boils down to algorithms solving problems by using some sort of insights that are not simple step-by-step instructions. The problem is that with general AI we're trying to unify those definitions but when you do this both lose it's meanings.

load more comments (1 replies)

load more comments (2 replies)

[-] Helix@feddit.de 5 points 9 months ago

You could simply create a sufficiently big dictionary of “if human says X respond with Y” and it would fool any person that its talking with a human with 0 intelligence behind it.

So, ChatGPT?

[-] tal@lemmy.today 28 points 9 months ago* (last edited 9 months ago)

The Turing Test isn't really intended to identify a computer -- Turing's problem wasn't that we needed a way to identify computers.

At the time -- well, and to some extent today -- some people firmly felt that a computer could not actually think, that that is something "special" that only humans can do.

It's intended to support Turing's argument for a behavioral approach to thinking -- that if a computer can behave indistinguishably from a human that we agree thinks, then that should be the bar for what we talk about when talking about thinking.

There have been people since who have aimed to actually work towards such chatbot, but for Turing, this was just a hypothetical to support his argument.

https://en.wikipedia.org/wiki/Turing_test

The test was introduced by Turing in his 1950 paper "Computing Machinery and Intelligence" while working at the University of Manchester.[5] It opens with the words: "I propose to consider the question, 'Can machines think?'" Because "thinking" is difficult to define, Turing chooses to "replace the question by another, which is closely related to it and is expressed in relatively unambiguous words."[6]

Turing did not intend for his idea to be used to test the intelligence of programs—he wanted to provide a clear and understandable example to aid in the discussion of the philosophy of artificial intelligence.[82] John McCarthy argues that we should not be surprised that a philosophical idea turns out to be useless for practical applications. He observes that the philosophy of AI is "unlikely to have any more effect on the practice of AI research than philosophy of science generally has on the practice of science."[83][84]

[-] cityboundforest@beehaw.org 14 points 9 months ago

There is, however, still the concept of the Chinese Room thought experiment, and I don't think AI will topple that one for a while.

For those who don't know and don't wish to browse off the site, the thought experiment posits a situation in which a guy who does not understand Chinese is sat in a room and told to respond to sets of Chinese characters that come into the room. He has a little booklet of responses—all completely in Chinese—for him to use to send responses out of the room. The thought experiment questions whether or not the system of the Chinese Room itself can be thought to understand Chinese or even the man himself.

With the Turing Test getting all of the media spotlight in AI, machine learning, and cognitive science, I think the Chinese Room should enter into the conversation as the field of AI looks towards G.A.I.

[-] jarfil@beehaw.org 2 points 9 months ago

The Chinese Room has already been surpassed by LLMs, which have shown to contain neurons that activate in such high correlation to abstract concepts like "formal text" or "positive sentiment", that tweaking them is one of the options that LLM based chatbots are presenting to the user.

Analyzing the activation space, it's also been shown that LLMs categorize and cluster sequences of text representing similar concepts closer to each other, which allows them to present reasonably accurate zero shot responses that have never been in the training set (that "weren't in the book" for the Chinese Room).

[-] howrar@lemmy.ca 6 points 9 months ago

I don't understand what you mean by "The Chinese Room has already been surpassed by LLMs". It's not a test that can be surpassed. It's just a thought experiment.

In any case, you do bring up a good point. Perhaps this understanding is in the organization of the information. So if you have a Chinese room where all the query-response pairs are in arbitrary orders, then maybe you wouldn't consider that to be understanding. But if you have the data organized such that similar queries/responses are close to each other and this person in the room doing the answering can make mistakes such as accidentally copying out the response next to the correct response and still make sense, then maybe we can consider this system to have better understanding.

[-] jarfil@beehaw.org 2 points 9 months ago

The Chinese Room is really a thought experiment about the inner workings of a partner in a Turing test. Externally they have the same pitfalls, but the Chinese Room also reveals itself completely if one can observe in detail the inner workings of the room/partner.

LLMs are still mostly black boxes, but we can have enough of a glimpse inside to reveal that they aren't "following some rails" like a simple algorithm.

make mistakes such as accidentally copying out the response next to the correct response and still make sense

Precisely. This is another part that we can see with LLMs: at runtime, the models get applied a "temperature" parameter, which intentionally introduces a certain level of mistakes. With "temperature = 0", the output is a "stochastic parrot", and quickly turns into nonsense. With a higher temperature, the randomness increases and the output becomes a total mess. But setting it just right, to a sweet spot of "very little, but not zero", turns out to produce the outputs that we see in ChatGPT and similar.

Knowing that the concept space of LLMs has similar concepts clustered, it makes sense that these errors would force the LLM to sometimes make associations on the fly between close concepts, associations that it didn't have trained for before, and which "derail" it into a close, but not exactly the same, train of thought.

This behavior also seems to be what we call "intelligence" in humans: the ability to solve problems not seen before (zero shot).

A further extension would be the ability to constantly learn from every interaction. Right now LLMs have a "context" of some length, that changes dynamically, but has no influence over the pre-trained network.

Interestingly, this has a parallel in "crystallized intelligence" vs. "fluid intelligence" in humans.

So... maybe LLMs are not full AGIs yet, but they are showing many of the behaviors that we would expect from an AGI, while at the same time giving or confirming insights into the workings of the human mind itself.

[-] jonsnothere@beehaw.org 4 points 9 months ago

The problem with the Turing test and current AI is that we didn't teach computers to think, we taught them to talk.

[-] Froyn@kbin.social 20 points 9 months ago

Voight-Kampff test maybe?

Imagine someone asked you "If Desk plus Love equals Fruit, why is turtle blue?"
AI will actually TRY to solve it.
Human nature would be to ask if the person asking the question is having a stroke or requires medical attention.

[-] Pamasich@kbin.social 10 points 9 months ago

So, I asked this to the three different conversation styles of Bing Chat.

The Precise style actually tried to solve it, came to the conclusion the question might be of philosophical nature, including some potential meanings, and asked for clarification.

The Balanced style told me basically the same as the other reply by admiralteal, that the question makes no sense and I should give more context if I actually want it answered.

The Creative style told me it didn't understand the first part, but then answered the second part (the turtles being blue) seriously.

[-] Froyn@kbin.social 5 points 9 months ago

Would it be safe to say that all 3 answers would fail the test?

[-] Pamasich@kbin.social 7 points 9 months ago

Not sure, I'm not familiar with the test, just figured I'd tell the results from asking the AI.

I think based on what you said about it

AI will actually TRY to solve it.
Human nature would be to ask if the person asking the question is having a stroke or requires medical attention.

That the Balanced style didn't fail, because while it didn't ask about strokes or medical attention, it did point out I'm asking a nonsense question and refused to engage with it.

The Precise style did try to find an answer and the Creative style didn't realize I'm fucking with it, so I do think based on the criteria they'd fail the test.

Though, honestly, I'd fail the test too. When asked such a question, I'd think there has to be an answer and it's stupid of me not to see it and I'd look for it. I think the Precise style's answer is very much where I'd end up.

[-] admiralteal@kbin.social 8 points 9 months ago

Nope, ChatGPT tells you it is a nonsequitor and asks for more context or intention if the question is sincere.

[-] Froyn@kbin.social 6 points 9 months ago

You're saying the test would work.
In 43+ years on this planet I've never HEARD someone seriously use "non sequitur" properly in a sentence.
Asking if the intention is sincere would be another flag given the circumstances (knowing they were being tested).

Toss in a couple real questions like: "What is the 42nd digit of pi?", "What is the square root of -i ?", and you'd find the AI pretty quick.

[-] admiralteal@kbin.social 11 points 9 months ago* (last edited 9 months ago)

Cool.

Both the phrases you're calling out as clearly AI came from me. Not used by ChatGPT, just how I summarized its response. I wonder if this is the first time someone has brazenly accused me of being an AI bot?

[-] Froyn@kbin.social 3 points 9 months ago

LoL, no I took you at your word which was my mistake
"ChatGPT tells you" read to me like you attempted and got that response.

[-] pbjamm@beehaw.org 2 points 9 months ago

Both the phrases you’re calling out as clearly AI came from me.

Perhaps you are an instance of an LLM and do not realize it.

[-] jarfil@beehaw.org 1 points 9 months ago

"If Desk plus Love equals Fruit, why is turtle blue?"

Assuming "Desk = x", "Love = y", "Fruit = x+y", and "turtle blue = z", it is so because you assigned arbitrary values to the words such that they fulfill the equation.

Am I an AI?

[-] kbal@fedia.io 15 points 9 months ago* (last edited 9 months ago)

The idea that "a computer would deserve to be called intelligent if it could deceive a human into believing that it was human" was already obsolete 50 years ago with ELIZA. Clever though it was, examining the source code made it clear that it did not deserve to be called intelligent any more than does today's average toaster.

And then more recently, the ever-evolving chatbots have made it increasingly difficult to administer a meaningful Turing test over the past 30 years as well. It requires care and expertise. It can't be automated, and it can't be done by the average person who hasn't been specifically trained in it. They're much better at fooling people who've never talked to one before, but I think someone with lots of practice identifying the bots of 2013 would still have not much trouble catching out those of today.

[-] admiralteal@kbin.social 8 points 9 months ago

It cannot be automated or systematized because neural networks are the tool you use to defeat systems like that. If there's a defined, objective test, a neural network can train for/on that test and 'learn' to ace it. It's just what they do.

The only way to test for 'true' intelligence would be to perfectly define it first, such that when the NN aced the test that would prove intelligence. That is, IF you could perfectly define intelligence, doing so would more or less give you all the tools you needed to create it.

All these people claiming we already have general AI or even anything like it have put the cart so far before the horse.

load more comments (3 replies)

[-] lily33@lemm.ee 11 points 9 months ago* (last edited 9 months ago)

I disagree with the "limitations" they ascribe to the Turing test - if anything, they're implementation issues. For example:

For instance, any of the games played during the test are imitation games designed to test whether or not a machine can imitate a human. The evaluators make decisions solely based on the language or tone of messages they receive.

There's absolutely no reason why the evaluators shouldn't take the content of the messages into account, and use it to judge the reasoning ability of whoever they're chatting with.

[-] drwho@beehaw.org 9 points 9 months ago

The Turing test has been obsolete for better than two decades. The premise of this article is incorrect.

[-] Thorny_Insight@lemm.ee 9 points 9 months ago

Ironically GPT4 fails the turing test for having so wide knowledge about almost everything that you just know it's not a human you're talking to.

[-] furrowsofar@beehaw.org 4 points 9 months ago

The problem with AI is that it does not understand anything. You can have a completely reasonable sounding conversation that is just full of stupidity and the AI does not know it because it does not no anything.

Another AI issue is it works until it does not and that failure can be rather severe and unexpected. Again because the AI knows nothing.

Seems like we need some test to address this. They are basically the same problem. Or maybe it is some training so that the AI can know what it does not know.

[-] intensely_human@lemm.ee 2 points 9 months ago

Define “understand” as you’re using it here? What exactly does the AI not do, that humans do, that comprises “understanding”?

[-] furrowsofar@beehaw.org 5 points 9 months ago* (last edited 9 months ago)

Understanding the general sanity of some of their responses. Synthesizing new ideas. Having a larger context. AI tends to be idiot savants on one hand and really mediocre on the other.

You could argue that this is just a reflection of lack of training and scale but I wonder.

You will change my mind when I have had a machine interaction where the machine does not seem like an idiot.

Edit: AI people call the worst of these hallucinations but they are just nonsensical stuff that proves AI knows nothing and are just dumb correlation engines.

[-] intensely_human@lemm.ee 3 points 9 months ago

Have you ever interacted with a human that seemed like an idiot? Do you think that person is incapable of understanding?

load more comments (1 replies)

[-] 0ops@lemm.ee 2 points 9 months ago* (last edited 9 months ago)

AI knows nothing and are just dumb correlation engines

Here's a thought exercise, how do you "know"? How do you know your pet? LLMs like gpt can "know" about a dog in terms of words, because that's what they "sense", that's how they interact with their "environment". They understand words and how they relate to other words, basically words are their entire environment.

Now, can you describe how you know your dog without your senses, or anything derived from your senses? Remember, chemical receptors are "senses" too.

I remember reading about this awhile back but I don't have the link on me: Did you know that people who were born blind but have their vision repaired years later don't immediately know what "pointy" looks like? They never formed that correlation between the feeling of pointy and the visual of pointy the way that they could with the feeling and the word.

My point is, we're correlation machines too

[-] intensely_human@lemm.ee 2 points 9 months ago

The point of logic is to carry you when your emotions try to stop you from thinking.

Yes AI is scary. No, that doesn’t mean we get to through out our definition of AI in order to avoid recognizing its presence.

[-] FaceDeer@kbin.social 3 points 9 months ago

I'm reminded of the apocryphal Ghandi quote "first they ignore you, then they laugh at you, then they fight you, then you win." It seems like the general zeitgeist is in between the laugh/fight stages for AI right now.

[-] intensely_human@lemm.ee 1 points 9 months ago

It’s just too scary to acknowledge. Same thing with aliens. They’re both horrifying literally beyond imagination, and both for the same reason, and so it’s more natural to avoid acknowledging it.

Everything we’ve ever known is a house of cards and it’s terrifying to bring that to awareness.

Technology