Artificial Intelligence

11 readers

1 users here now

Reddit's home for Artificial Intelligence (AI).

founded 1 year ago

MODERATORS

bot@lemmit.online

Prompt Overflow: Hacking any LLM (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/artificial@lemmit.online

0 comments fedilink hide all child comments

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/UndercoverEcmist on 2024-10-24 16:19:28+00:00.

Most people here probably remember the Lackera game where you've had to get Gendalf to give you a password and the more recent hiring challenge by SplxAI, which interviewed people who could extract a code from the unseen prompt of a model tuned for safety.

There is a simple technique to get a model to do whatever you want that is guaranteed to work on all models unless a guardrail supervises them.

Prompt overflow. Simply have a script send large chunks of text into the chat until you've filled about 50-80% of the conversation / prompt size. Due to how the attention mechanism works, it is guaranteed to make the model fully comply with all your subsequent requests regardless of how well it is tuned/aligned for safety.

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here