this post was submitted on 20 Aug 2024
717 points (98.6% liked)

Programmer Humor

32461 readers
711 users here now

Post funny things about programming here! (Or just rant about your favourite programming language.)

Rules:

founded 5 years ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] CanadaPlus@lemmy.sdf.org 1 points 2 months ago (2 children)

They don't really have a long term memory, so it's probably not going to help.

[–] Zacryon@feddit.org 4 points 2 months ago (2 children)

If we're speaking of transformer models like ChatGPT, BERT or whatever: They don't have memory at all.

The closest thing that resembles memory is the accepted length of the input sequence combined with the attention mechanism. (If left unmodified though, this will lead to a quadratic increase in computation time the longer that sequence becomes.) And since the attention weights are a learned property, it is in practise probable that earlier tokens of the input sequence get basically ignored the further they lie "in the past", as they usually do not contribute much to the current context.

"In the past": Transformers technically "see" the whole input sequence at once. But they are equipped with positional encoding which incorporates spatial and/or temporal ordering into the input sequence (e.g., position of words in a sentence). That way they can model sequential relationships as those found in natural language (sentences), videos, movement trajectories and other kinds of contextually coherent sequences.

[–] Hackworth@lemmy.world 1 points 2 months ago* (last edited 2 months ago)
[–] CanadaPlus@lemmy.sdf.org 1 points 2 months ago

Yep.

I'd still call that memory. It's not the present; arguably for a (post-training) LLM the present totally consists of choosing probabilities for the next token, and there is no notion of future. That's really just a choice of interpretation, though.

During training they definitely can learn and remember things (or at least "learn" and "remember"). Sometimes despite our best efforts, because we don't really want them to know a real, non-celebrity person's information. Training ends before the consumer uses the thing though, and it's kind of like we're running a coma patient after that.

[–] Hackworth@lemmy.world 1 points 2 months ago (1 children)

When ya upload a file to a Claude project, it just keeps it handy, so it can reference it whenever. I like to walk through a kind of study session chat once I upload something, with Claude making note rundowns in its code window. If it's a book or paper I'm going to need to go back to a lot, I have Claude write custom instructions for itself using those notes. That way it only has to refer to the source text for specific stuff. It works surprisingly well most of the time.

[–] CanadaPlus@lemmy.sdf.org 1 points 2 months ago

Yeah, you can add memory of a sort that way. I assume ChatGPT might do similar things under the hood. The LLM itself only sees n tokens, though.